我有一个这样的DataFrame:
Col1 | Col2 | 得分 |
---|---|---|
A | B | 0.6 |
A | B | 0.6 |
B | A | 0.6 |
A | C | 0.8 |
C | A | 0.8 |
D | E | 0.9 |
IIUC,您可以使用frozenset
作为分组器:
group = df[['Col1', 'Col2']].agg(frozenset, axis=1)
(df
.groupby(group, as_index=False) # you can also group by [group, 'Score']
.agg(**{c: (c, 'first') for c in df},
Duplicates=('Score', 'count'),
)
)
输出:
Col1 Col2 Score Duplicates
0 A B 0.6 3
1 A C 0.8 2
2 D E 0.9 1
以下是使用np.sort
的另一种方法
df[['Col1','Col2']] = np.sort(df[['Col1','Col2']].to_numpy(),axis=1)
(df.groupby(['Col1','Col2']).agg(
Count = ('Score','count'),
Score = ('Score','first'))
.reset_index())
输出:
Col1 Col2 Count Score
0 A B 3 0.6
1 A C 2 0.8
2 D E 1 0.9