这是我的数据
Column IV Source
RRD 5.795765 Personal_Demographics
RRD 5.795765 Cust360_Agreement
RRD 5.792729 External_Data
WO 4.361066 Cust360_Asset
Rating 3.600918 Personal_Demographics
我的预期结果
Column IV Source
RRD 5.795765 Personal_Demographics
WODate 4.361066 Cust360_Asset
Rating 3.600918 Personal_Demographics
我尝试的
inds = df.groupby(['Column'])['IV'].transform(max) == df['IV']
但是的结果
Column IV Source
RRD 5.795765 Personal_Demographics
RRD 5.795765 Cust360_Agreement
WO 4.361066 Cust360_Asset
Rating 3.600918 Personal_Demographics
第一个是有类似的值,但我只需要一个像一样的输出
Column IV Source
RRD 5.795765 Personal_Demographics
WO 4.361066 Cust360_Asset
Rating 3.600918 Personal_Demographics
问候
尝试drop_duplicates
+sort_values
out = df.sort_values('IV',ascending=False).drop_duplicates('Column')
Out[121]:
Column IV Source
0 RRD 5.795765 Personal_Demographics
3 WO 4.361066 Cust360_Asset
4 Rating 3.600918 Personal_Demographics
如果您想要groupby
df.sort_values('IV',ascending=False).groupby(['Column']).head(1)