返回按帐户增加的分数

我开发了一段代码，可以计算特定列(在本例中，该列为Anxiety(的数值增加。

计数代码：

len([b-a for a,b in zip(df['Anxiety'],df['Anxiety'][1:]) if b>a])

设置代码：

df = pd.DataFrame({'Account':[123,123,123,123,123,123,123,123,123,123,456,456,456,456],
'Anxiety':[0,1,np.nan,2,3,0,2,np.nan,np.nan,0,0,1,np.nan,3]})
df

然而，这里有两个问题。一个是它不考虑不同的帐户，如果值之间有一个null值，它就不会正确计数。

账户123的预期产出为4，账户456为2。

这里有一种方法可以实现

#create a temp column 'diff' by taking a difference from previous row (excluding NaN), where difference is positive
# using groupby to sum the positive differences from previous rows
df.assign(
diff=(df[df['Anxiety'].notna()]['Anxiety'].diff()>0).astype(int)
).groupby('Account')['diff'].sum()

Account
123    4.0
456    2.0
Name: diff, dtype: float64

尝试：

def n_incr(g):
return (g.ffill().diff() > 0).sum()
>>> df.groupby('Account').agg(n_incr)
Anxiety
Account         
123            4
456            2

类似：

out = df[df['Anxiety'].notna()].groupby('Account')['Anxiety'].apply(
lambda x: x[x > x.shift()].size)

打印(输出(：

Account
123    4
456    2

相关内容

最新更新

热门标签：