如何在Panda中基于单个列对多个列应用多个操作



我有一个示例数据帧,如下所示:

primaryName    averageRating                 primaryProfession    knownForTitles runtimeMinutes
1   Fred Astaire            7.0      soundtrack,actor,miscellaneous      tt0072308            165
2   Fred Astaire            6.9      soundtrack,actor,miscellaneous      tt0031983             93
3   Fred Astaire            7.0      soundtrack,actor,miscellaneous      tt0050419            103
4   Fred Astaire            7.1      soundtrack,actor,miscellaneous      tt0053137            134

所以基本上我想取averageRating列的平均值;男演员/女演员";从primaryProfession列,基于primaryName栏的knownForTitles的计数和runtimeMinutes之和。输出数据帧应该如下所示:

primaryName    averageRating      primaryProfession    knownForTitles   runtimeMinutes
1   Fred Astaire            28                    actor            4            495

有什么想法我可以做到这一点吗?提前感谢您的帮助。

试试这个:

df.loc[df['primaryProfession'].str.contains('actor'), 'primaryProfession'] = 'actor'
df.loc[df['primaryProfession'].str.contains('actress'), 'primaryProfession'] = 'actress'
df.groupby(['primaryName', 'primaryProfession'], as_index=False).agg({'averageRating':'mean', 'knownForTitles':'count', 'runtimeMinutes':'sum'})

最新更新