分组依据的聚合函数:获取最常用的值，但如果它是空白，则采用第二个最常见的值

编辑：澄清了问题

我想聚合一个 pd。数据帧通过"标识符"调用df，并对"成本"列求和。对于类别列，我想应用一个可以大声说出的聚合函数，例如"聚合并获取列中最常见的值(模式)，但如果模式为空白，则采用第二频繁的列"。换句话说，我想拥有类别的模式(聚合后)，但模式不能是空白的。

结果应该是 pd。数据帧new_df。

df
Identifier  Cost  Cost2 Category1 Category2 Category3
0          A    10     10       one                 aaa
1          A    20     10                blue       aaa
2          B    10     20       two                 bbb
3          B    10     30               green       bbb
4          B    30     40                           bbb
5          C    20     50     three       red       ccc

---聚合过程--->

new_df
Identifier  Cost  Cost2 Category1 Category2 Category3
0          A    30     20       one      blue       aaa
1          B    50     90       two     green       bbb
2          C    20     50     three       red       ccc

重现示例的代码：

import pandas as pd
data_df = {       
'Identifier': ['A', 'A', 'B', 'B', 'B', 'C'],
'Cost': [10, 20, 10, 10, 30, 20],
'Cost2':[10,10,20,30,40,50],
'Category1' : ['one', '', 'two', '', '', 'three'],
'Category2' : ['', 'blue', '', 'green', '', 'red'],
'Category3' : ['aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'ccc']
}
df = pd.DataFrame(data_df)

data_new_df = {       
'Identifier': ['A', 'B', 'C'],
'Cost': [30, 50, 20],
'Cost2' : [20,90,50],
'Category1' : ['one', 'two', 'three'],
'Category2' : ['blue', 'green', 'red'],
'Category3' : ['aaa', 'bbb', 'ccc']
}
new_df = pd.DataFrame(data_new_df)

也许你可以尝试groupby以下sum：

new_df = df.groupby('Identifier').apply(sum).drop('Identifier', axis=1).reset_index()

结果：

Identifier  Cost Category1 Category2
0          A    30       one      blue
1          B    50       two     green
2          C    20     three       red

你可以试试：

new_df = df.groupby('Identifier').sum().reset_index()
new_df['Category1'] = df.loc[df.Category1 != '', 'Category1'].reset_index(drop=True)
new_df['Category2'] = df.loc[df.Category2 != '', 'Category2'].reset_index(drop=True)
new_df

结果：

Identifier  Cost Category1 Category2
0          A    30       one      blue
1          B    50       two     green
2          C    20     three       red

相关内容

最新更新

热门标签：