groupby.col.diff 给出意外错误


df = pd.DataFrame(
{'ts':[1,2,3,4,60,61,62,63,64,150,155,156,
1,2,3,4,60,61,62,63,64,150,155,156,
1,2,3,4,60,61,62,63,64,150,155,156],
'id': [1,2,3,4,60,61,62,63,64,150,155,156,
71,72,73,74,80,81,82,83,64,160,165,166,
21,22,23,24,90,91,92,93,94,180,185,186],
'other':['x','x','x','x','x','x','x','x','x','x','x','x',
'y','y','y','y','y','y','y','y','y','y','y','y',
'z','z','z','z','z','z','z','z','z','z','z','z'],
'user':['x','x','x','x','y','x','x','x','x','x','x','x',
'y','y','y','y','x','y','y','y','y','y','y','y',
'z','z','z','z','z','z','z','z','z','z','z','z']
})

df.set_index('id', inplace=True)
df.sort_values('ts',inplace=True)

for x, g in df.groupby('user'):
# call 1
print(g.ts.diff())
# call 2
df.groupby('user').ts.diff()

我不确定为什么我在呼叫 2 中出现错误。我还注意到,当我删除sort_values时,呼叫 2 通过。

有人可以解释一下这种行为吗?

无论是否调用排序,我都会收到错误。无论如何,我认为您正在寻找的是:

df['group_diff'] = df.ts.groupby(df.user).transform(pd.Series.diff)
>>> df.head()
other   ts  user    group_diff
id              
1   x   1   x   NaN
2   x   2   x   1.0
3   x   3   x   1.0
4   x   4   x   1.0
60  x   60  y   Nan

groupby之后,你执行一个transform,使用某个函数在每个组中为每个条目创建一个条目。此功能只是pd.Series.diff

请注意第 0 行和第 4 行的Nan- 它们分别对应于x组和y组的开头。

最新更新