我有一个数据集,在那里我需要找到每种类型的平均评分前2名的电影。在给定的数据布局下,我如何实现它?有人能帮我理解一个变通办法吗?
为了简化它,我创建了新的列,将值从"流派"分离为独特的流派,并为其分配值1和0。因此,分组依据将基于这些新的独特流派,而不是"流派"列。例如,如果我将"Comedy"列过滤为1,标题"abc"、"pqr"、"hi"one_answers"mno"的平均评分分别为3.75、2.9、2.25和3.12。因此,根据喜剧类型的平均评分,排名前两的电影将是abc和mno。其他独特的流派也将遵循类似的逻辑。
我的原始数据集与下面的示例数据集非常相似:
| movieId | title | genres | userId | rating | release_year | review_year | Children | Action | Mystery | Sci-Fi | (no genres listed) | Drama | Fantasy | Film-Noir | Romance | Thriller | Western | Comedy | Documentary | Crime | War | Horror | Animation | IMAX | Adventure | Musical |
|:-------:|:-----:|:-----------------------------------------------:|:------:|:------:|--------------|:-----------:|:--------:|:------:|:-------:|:------:|:--------------------:|:-----:|:-------:|:---------:|:-------:|:--------:|:-------:|:------:|:-----------:|:-----:|:---:|:------:|:---------:|:----:|:---------:|:-------:|
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 3437 | 3.5 | 1995 | 2009 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 7107 | 3 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 17211 | 4 | 1995 | 2010 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 18056 | 3 | 1995 | 2013 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 22223 | 5 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 28268 | 3.5 | 1995 | 2016 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 32984 | 4.5 | 1995 | 2013 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 33260 | 0.5 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 35223 | 3.5 | 1995 | 2009 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 54852 | 4.5 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 56556 | 4 | 1995 | 2016 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 59622 | 5 | 1995 | 2017 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 78317 | 4.5 | 1995 | 2011 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 80922 | 4 | 1995 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 8667 | 3 | 2000 | 2010 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 16458 | 2 | 2000 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 16464 | 3.5 | 2000 | 2011 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 19786 | 4 | 2000 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 30494 | 3.5 | 2000 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | pqr | Comedy__Romance | 204961 | 4 | 2008 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 220587 | 3.5 | 2008 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 224358 | 3 | 2008 | 2014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 231990 | 2.5 | 2008 | 2005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 234484 | 3 | 2008 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 251794 | 1.5 | 2008 | 2011 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 4294 | 3 | 2012 | 2016 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 81593 | 2.5 | 2012 | 2009 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 114192 | 2.5 | 2012 | 2013 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 178767 | 0.5 | 2012 | 2014 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 212863 | 2 | 2012 | 2005 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 251932 | 3 | 2012 | 2005 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 16481 | 4 | 2015 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 25066 | 4 | 2015 | 2014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 27833 | 3 | 2015 | 2005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 31864 | 1.5 | 2015 | 2014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
作为数据帧:
pd.DataFrame({'movieId': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5],
'title': ['abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','xyz','xyz','xyz','xyz','xyz','pqr','pqr','pqr','pqr','pqr','pqr','ghi','ghi','ghi','ghi','ghi','ghi','mno','mno','mno','mno'],
'genres': ['Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy','Comedy','Comedy','Comedy'],
'userId': [3437,7107,17211,18056,22223,28268,32984,33260,35223,54852,56556,59622,78317,80922,8667,16458,16464,19786,30494,204961,220587,224358,231990,234484,251794,4294,81593,114192,178767,212863,251932,16481,25066,27833,31864],
'rating': [3.5,3.0,4.0,3.0,5.0,3.5,4.5,0.5,3.5,4.5,4.0,5.0,4.5,4.0,3.0,2.0,3.5,4.0,3.5,4.0,3.5,3.0,2.5,3.0,1.5,3.0,2.5,2.5,0.5,2.0,3.0,4.0,4.0,3.0,1.5],
'release_year': [1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,2000,2000,2000,2000,2000,2008,2008,2008,2008,2008,2008,2012,2012,2012,2012,2012,2012,2015,2015,2015,2015],
'review_year': [2009,2014,2010,2013,2014,2016,2013,2014,2009,2014,2016,2017,2011,2005,2010,2005,2011,2005,2005,2016,2016,2014,2005,2016,2011,2016,2009,2013,2014,2005,2005,2016,2014,2005,2014],
'Children': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Action': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Mystery': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Sci-Fi': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'(no genres listed)': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Drama': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0],
'Fantasy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Film-Noir': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Romance': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0],
'Thriller': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Western': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Comedy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
'Documentary': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Crime': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'War': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Horror': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Animation': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'IMAX': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Adventure': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Musical': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})
我的输出与下面的类似:
| Genre | title | avg rating |
|-----------|-------|------------|
| Children | abc | 3.75 |
| Drama | ghi | 2.25 |
| Fantasy | abc | 3.75 |
| Fantasy | xyz | 3.2 |
| Romance | pqr | 2.9 |
| Romance | ghi | 2.25 |
| Comedy | abc | 3.75 |
| Comedy | mno | 3.12 |
| Animation | abc | 3.75 |
| Adventure | abc | 3.75 |
| Adventure | xyz | 3.2 |
请注意,这是一个样本数据集,有许多独特的流派的值为0。因此,这些独特的流派并没有出现在最终输出的"流派"列中。此外,一些独特的类型只有一部电影属于他们的类别,所以他们只有一个这些电影的列表。
如果要获得每个类型的前2部电影,首先需要melt
伪变量,然后是groupby
+agg
:
(df.melt(id_vars=df.columns[:7], var_name='genre')
.query('value == 1')
.groupby(['genre', 'movieId'], as_index=False).agg({'title': 'first', 'rating': 'mean'})
.groupby('genre').head(2)
)
输出:
genre movieId title rating
0 Adventure 1 abc 3.750000
1 Adventure 2 xyz 3.200000
2 Animation 1 abc 3.750000
3 Children 1 abc 3.750000
4 Children 2 xyz 3.200000
5 Comedy 1 abc 3.750000
6 Comedy 3 pqr 2.916667
9 Drama 4 ghi 2.250000
10 Fantasy 1 abc 3.750000
11 Fantasy 2 xyz 3.200000
12 Romance 3 pqr 2.916667
13 Romance 4 ghi 2.250000