我有一个数据框df
:
Group1. Group2 Val
0 1 Q 2
1 1 Q 3
2 2 R 8
3 4 Y 9
我想用每组值的列表更新df,所以新的df将是
Group Group2 Val new
0 1 Q 2 [2, 3]
1 1 Q 3 [2, 3]
2 2 R 8 [8]
3 4 Y 9 [9]
这样做的最好方法是什么?
你不能直接使用groupby.transform
,所以groupby.agg
和map
(或merge
,如果你有几个groupers):
df['new'] = df['Group'].map(df.groupby('Group')['Val'].agg(list))
输出:
Group Val new
0 1 2 [2, 3]
1 1 3 [2, 3]
2 2 8 [8]
3 4 9 [9]
使用多个列进行分组:
cols = ['Group1', 'Group2']
df['new'] = df.merge(df.groupby(cols)['Val'].agg(list),
left_on=cols, right_index=True, how='left')['Val_y']
的例子:
Group1 Group2 Val new
0 1 Q 2 [2, 3]
1 1 Q 3 [2, 3] # used Q here as example
2 2 R 8 [8]
3 4 Y 9 [9]
如果你想使用transform
,我认为在其他答案中可能比map
慢一点,或者可能不是。如果你想知道怎么做的话:
df['new'] = df.assign(
Vals=df['Val'].values.reshape(-1, 1).tolist()
).groupby('Group1')['Vals'].transform(sum)
print(df)
Group1 Group2 Val new
0 1 Q 2 [2, 3]
1 1 T 3 [2, 3]
2 2 R 8 [8]
3 4 Y 9 [9]
瞬态Vals
列如下:
0 [2]
1 [3]
2 [8]
3 [9]
Name: Vals, dtype: object