我开发了一段代码,可以计算特定列(在本例中,该列为Anxiety(的数值增加。
计数代码:
len([b-a for a,b in zip(df['Anxiety'],df['Anxiety'][1:]) if b>a])
设置代码:
df = pd.DataFrame({'Account':[123,123,123,123,123,123,123,123,123,123,456,456,456,456],
'Anxiety':[0,1,np.nan,2,3,0,2,np.nan,np.nan,0,0,1,np.nan,3]})
df
然而,这里有两个问题。一个是它不考虑不同的帐户,如果值之间有一个null值,它就不会正确计数。
账户123的预期产出为4,账户456为2。
这里有一种方法可以实现
#create a temp column 'diff' by taking a difference from previous row (excluding NaN), where difference is positive
# using groupby to sum the positive differences from previous rows
df.assign(
diff=(df[df['Anxiety'].notna()]['Anxiety'].diff()>0).astype(int)
).groupby('Account')['diff'].sum()
Account
123 4.0
456 2.0
Name: diff, dtype: float64
尝试:
def n_incr(g):
return (g.ffill().diff() > 0).sum()
>>> df.groupby('Account').agg(n_incr)
Anxiety
Account
123 4
456 2
类似:
out = df[df['Anxiety'].notna()].groupby('Account')['Anxiety'].apply(
lambda x: x[x > x.shift()].size)
打印(输出(:
Account
123 4
456 2