我有一个示例数据帧,如下所示:
primaryName averageRating primaryProfession knownForTitles runtimeMinutes
1 Fred Astaire 7.0 soundtrack,actor,miscellaneous tt0072308 165
2 Fred Astaire 6.9 soundtrack,actor,miscellaneous tt0031983 93
3 Fred Astaire 7.0 soundtrack,actor,miscellaneous tt0050419 103
4 Fred Astaire 7.1 soundtrack,actor,miscellaneous tt0053137 134
所以基本上我想取averageRating
列的平均值;男演员/女演员";从primaryProfession
列,基于primaryName
栏的knownForTitles
的计数和runtimeMinutes
之和。输出数据帧应该如下所示:
primaryName averageRating primaryProfession knownForTitles runtimeMinutes
1 Fred Astaire 28 actor 4 495
有什么想法我可以做到这一点吗?提前感谢您的帮助。
试试这个:
df.loc[df['primaryProfession'].str.contains('actor'), 'primaryProfession'] = 'actor'
df.loc[df['primaryProfession'].str.contains('actress'), 'primaryProfession'] = 'actress'
df.groupby(['primaryName', 'primaryProfession'], as_index=False).agg({'averageRating':'mean', 'knownForTitles':'count', 'runtimeMinutes':'sum'})