如何在Python中基于具有布尔值的多个列来查找列的平均值



我有一个数据集,在那里我需要找到每种类型的平均评分前2名的电影。在给定的数据布局下,我如何实现它?有人能帮我理解一个变通办法吗?

为了简化它,我创建了新的列,将值从"流派"分离为独特的流派,并为其分配值1和0。因此,分组依据将基于这些新的独特流派,而不是"流派"列。例如,如果我将"Comedy"列过滤为1,标题"abc"、"pqr"、"hi"one_answers"mno"的平均评分分别为3.75、2.9、2.25和3.12。因此,根据喜剧类型的平均评分,排名前两的电影将是abc和mno。其他独特的流派也将遵循类似的逻辑。

我的原始数据集与下面的示例数据集非常相似:

| movieId | title |                      genres                     | userId | rating | release_year | review_year | Children | Action | Mystery | Sci-Fi | (no genres   listed) | Drama | Fantasy | Film-Noir | Romance | Thriller | Western | Comedy | Documentary | Crime | War | Horror | Animation | IMAX | Adventure | Musical |
|:-------:|:-----:|:-----------------------------------------------:|:------:|:------:|--------------|:-----------:|:--------:|:------:|:-------:|:------:|:--------------------:|:-----:|:-------:|:---------:|:-------:|:--------:|:-------:|:------:|:-----------:|:-----:|:---:|:------:|:---------:|:----:|:---------:|:-------:|
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 3437   | 3.5    | 1995         | 2009        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 7107   | 3      | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 17211  | 4      | 1995         | 2010        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 18056  | 3      | 1995         | 2013        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 22223  | 5      | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 28268  | 3.5    | 1995         | 2016        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 32984  | 4.5    | 1995         | 2013        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 33260  | 0.5    | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 35223  | 3.5    | 1995         | 2009        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 54852  | 4.5    | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 56556  | 4      | 1995         | 2016        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 59622  | 5      | 1995         | 2017        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 78317  | 4.5    | 1995         | 2011        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 80922  | 4      | 1995         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 8667   | 3      | 2000         | 2010        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 16458  | 2      | 2000         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 16464  | 3.5    | 2000         | 2011        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 19786  | 4      | 2000         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 30494  | 3.5    | 2000         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 204961 | 4      | 2008         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 220587 | 3.5    | 2008         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 224358 | 3      | 2008         | 2014        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 231990 | 2.5    | 2008         | 2005        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 234484 | 3      | 2008         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 251794 | 1.5    | 2008         | 2011        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 4294   | 3      | 2012         | 2016        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 81593  | 2.5    | 2012         | 2009        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 114192 | 2.5    | 2012         | 2013        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 178767 | 0.5    | 2012         | 2014        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 212863 | 2      | 2012         | 2005        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 251932 | 3      | 2012         | 2005        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 16481  | 4      | 2015         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 25066  | 4      | 2015         | 2014        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 27833  | 3      | 2015         | 2005        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 31864  | 1.5    | 2015         | 2014        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |

作为数据帧:

pd.DataFrame({'movieId': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5],
'title': ['abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','xyz','xyz','xyz','xyz','xyz','pqr','pqr','pqr','pqr','pqr','pqr','ghi','ghi','ghi','ghi','ghi','ghi','mno','mno','mno','mno'],
'genres': ['Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy','Comedy','Comedy','Comedy'],
'userId': [3437,7107,17211,18056,22223,28268,32984,33260,35223,54852,56556,59622,78317,80922,8667,16458,16464,19786,30494,204961,220587,224358,231990,234484,251794,4294,81593,114192,178767,212863,251932,16481,25066,27833,31864],
'rating': [3.5,3.0,4.0,3.0,5.0,3.5,4.5,0.5,3.5,4.5,4.0,5.0,4.5,4.0,3.0,2.0,3.5,4.0,3.5,4.0,3.5,3.0,2.5,3.0,1.5,3.0,2.5,2.5,0.5,2.0,3.0,4.0,4.0,3.0,1.5],
'release_year': [1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,2000,2000,2000,2000,2000,2008,2008,2008,2008,2008,2008,2012,2012,2012,2012,2012,2012,2015,2015,2015,2015],
'review_year': [2009,2014,2010,2013,2014,2016,2013,2014,2009,2014,2016,2017,2011,2005,2010,2005,2011,2005,2005,2016,2016,2014,2005,2016,2011,2016,2009,2013,2014,2005,2005,2016,2014,2005,2014],
'Children': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Action': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Mystery': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Sci-Fi': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'(no genres listed)': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Drama': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0],
'Fantasy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Film-Noir': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Romance': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0],
'Thriller': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Western': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Comedy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
'Documentary': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Crime': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'War': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Horror': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Animation': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'IMAX': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Adventure': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Musical': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})

我的输出与下面的类似:

| Genre     | title | avg rating |
|-----------|-------|------------|
| Children  | abc   | 3.75       |
| Drama     | ghi   | 2.25       |
| Fantasy   | abc   | 3.75       |
| Fantasy   | xyz   | 3.2        |
| Romance   | pqr   | 2.9        |
| Romance   | ghi   | 2.25       |
| Comedy    | abc   | 3.75       |
| Comedy    | mno   | 3.12       |
| Animation | abc   | 3.75       |
| Adventure | abc   | 3.75       |
| Adventure | xyz   | 3.2        |

请注意,这是一个样本数据集,有许多独特的流派的值为0。因此,这些独特的流派并没有出现在最终输出的"流派"列中。此外,一些独特的类型只有一部电影属于他们的类别,所以他们只有一个这些电影的列表。

如果要获得每个类型的前2部电影,首先需要melt伪变量,然后是groupby+agg:

(df.melt(id_vars=df.columns[:7], var_name='genre')
.query('value == 1')
.groupby(['genre', 'movieId'], as_index=False).agg({'title': 'first', 'rating': 'mean'})
.groupby('genre').head(2)
)

输出:

genre  movieId title    rating
0   Adventure        1   abc  3.750000
1   Adventure        2   xyz  3.200000
2   Animation        1   abc  3.750000
3    Children        1   abc  3.750000
4    Children        2   xyz  3.200000
5      Comedy        1   abc  3.750000
6      Comedy        3   pqr  2.916667
9       Drama        4   ghi  2.250000
10    Fantasy        1   abc  3.750000
11    Fantasy        2   xyz  3.200000
12    Romance        3   pqr  2.916667
13    Romance        4   ghi  2.250000

最新更新