>我有以下数据框:
import pandas as pd
data = dict(name=['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c'],
objective=[20.0, 20.0, 25.0, 40.0, 40.5, 41.0, 60.0, 60.0],
price=[0.5, 1.0, 1.5, 1.0, 1.2, 1.4, 0.5, 1.0])
df = pd.DataFrame(data, columns=data.keys())
然后,我可以找到所有这些的独特组合,如下所示:
df.groupby(['name','objective', 'price']).size()
看起来像这样:
name objective price
a 20.0 0.5 1
1.0 1
25.0 1.5 1
b 40.0 1.0 1
40.5 1.2 1
41.0 1.4 1
c 60.0 0.5 1
1.0 1
当给定的name
和objective
有多个price
值时,我只想保留较低的price
值,即
name objective price
a 20.0 0.5 1
25.0 1.5 1
b 40.0 1.0 1
40.5 1.2 1
41.0 1.4 1
c 60.0 0.5 1
请问我该如何实现这一点?
您可以执行另一个groupby
并first
:
(df.groupby(['name','objective', 'price']).size()
.reset_index()
.groupby(['name', 'objective'])
.first()
)
输出:
price 0
name objective
a 20.0 0.5 1
25.0 1.5 1
b 40.0 1.0 1
40.5 1.2 1
41.0 1.4 1
c 60.0 0.5 1
我会做什么
df.sort_values('price').drop_duplicates(['name','objective'],keep='last').assign(cnt=1)
Out[421]:
name objective price cnt
0 a 20.0 0.5 1
2 a 25.0 1.5 1
3 b 40.0 1.0 1
4 b 40.5 1.2 1
5 b 41.0 1.4 1
6 c 60.0 0.5 1
您可以使用 groupby 和 minimum
df = df.groupby(['name','objective']).min()