在列表中组合DataFrame的唯一元素



我会尽量问清楚我的问题。

我有下面的DataFrame,看起来像这个

import pandas as pd
data = {'player' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'game' : ['Soccer', 'Basketball', 'Ping pong', 'Soccer', 'Tennis', 'Tennis', 'Baseball', 'Volleyball', 'Dodgeball']}
df = pd.DataFrame(data, columns=['player','game'])
player        game
0      A      Soccer
1      A  Basketball
2      A   Ping pong
3      B      Soccer
4      B      Tennis
5      B      Tennis
6      C    Baseball
7      C  Volleyball
8      C   Dodgeball

现在我只想让每个玩家的价值观保持唯一一次。理想情况下,在一个列表中,但这不是什么大不了的。

例如,玩家ABsoccer,所以我不希望足球出现在输出中。tennis出现两次,但都是为玩家B出现的,所以它会出现在输出中。

我想输出为:

player        game
0      A  Basketball
1      A   Ping pong
2      B      Soccer
3      B      Tennis
4      C    Baseball
5      C  Volleyball
6      C   Dodgeball

或者像这样:

player        game
0      A  [Basketball, Ping Pong]
1      B  [Soccer, Tennis]
2      C  [Baseball, Volleyball, Dodgeball]

谢谢你的帮助!

似乎需要通过DataFrame.drop_duplicates保留每列最后一个"游戏"来删除重复项,然后如果需要,列表通过list:聚合它们

df = (df.drop_duplicates('game', keep='last')
.groupby('player')['game']
.agg(list)
.reset_index())
print (df)
player                               game
0      A            [Basketball, Ping pong]
1      B                   [Soccer, Tennis]
2      C  [Baseball, Volleyball, Dodgeball]

最新更新