>我有这个数据帧。
df
userId movieId rating genres
0 41 97921 4.0 Comedy|Drama|Sci-Fi
1 47 97921 3.5 Comedy|Drama|Sci-Fi
2 594 539 5.0 Comedy|Drama|Romance|Adventure
3 4 539 5.0 Comedy|Drama|Romance|Adventure
4 113 1733 4.0 Drama|Romance
5 594 1733 5.0 Drama|Romance
我还有所有类型的列表:
genres = ['Comedy','Drama','Romance','Action','Adventure','Sci-Fi','Thriller','Crime',
'Animation','Children','Musical','Film-Noir','Fantasy','War','Mystery','IMAX',
'Horror','Western','Documentary' ]
我想计算数据框中的每种流派。
Expected Output:
Comedy :4
Drama :6
Sci-Fi: 2
Romance: 2
Adventure: 2
您可以使用:
df['genres'].str.split('|').explode().value_counts().to_dict() #requires pandas 0.25+
#{'Drama': 6, 'Comedy': 4, 'Romance': 4, 'Sci-Fi': 2, 'Adventure': 2}
或:
df['genres'].str.get_dummies().sum().to_dict()
#{'Adventure': 2, 'Comedy': 4, 'Drama': 6, 'Romance': 4, 'Sci-Fi': 2}