df = pd.DataFrame(
{'ts':[1,2,3,4,60,61,62,63,64,150,155,156,
1,2,3,4,60,61,62,63,64,150,155,156,
1,2,3,4,60,61,62,63,64,150,155,156],
'id': [1,2,3,4,60,61,62,63,64,150,155,156,
71,72,73,74,80,81,82,83,64,160,165,166,
21,22,23,24,90,91,92,93,94,180,185,186],
'other':['x','x','x','x','x','x','x','x','x','x','x','x',
'y','y','y','y','y','y','y','y','y','y','y','y',
'z','z','z','z','z','z','z','z','z','z','z','z'],
'user':['x','x','x','x','y','x','x','x','x','x','x','x',
'y','y','y','y','x','y','y','y','y','y','y','y',
'z','z','z','z','z','z','z','z','z','z','z','z']
})
df.set_index('id', inplace=True)
df.sort_values('ts',inplace=True)
for x, g in df.groupby('user'):
# call 1
print(g.ts.diff())
# call 2
df.groupby('user').ts.diff()
我不确定为什么我在呼叫 2 中出现错误。我还注意到,当我删除sort_values时,呼叫 2 通过。
有人可以解释一下这种行为吗?
无论是否调用排序,我都会收到错误。无论如何,我认为您正在寻找的是:
df['group_diff'] = df.ts.groupby(df.user).transform(pd.Series.diff)
>>> df.head()
other ts user group_diff
id
1 x 1 x NaN
2 x 2 x 1.0
3 x 3 x 1.0
4 x 4 x 1.0
60 x 60 y Nan
在groupby
之后,你执行一个transform
,使用某个函数在每个组中为每个条目创建一个条目。此功能只是pd.Series.diff
。
请注意第 0 行和第 4 行的Nan
- 它们分别对应于x
组和y
组的开头。