你好!
这是你友好的邻居提出的另一个问题。
在之前的一篇文章中,我看到了爆炸函数的使用(我是Python的新手),这解决了这个问题,但我一直在尝试以不同的方式使用它,但我似乎无法使它发挥作用。
我有这个:
[{'Name': 'Andes, The',
'Year': 2021,
'Score': '8',
'2nd Score': 8.8,
'% of People': '87%',
'Country': 'The Netherlands',
'Fruit': 'The Apple',
'Export Countries': 'United States,United Kingdom',
'Language': 'English,Japanese,French',
'Transit Duration': 148.0,
'Quality': 1.0,
'Taste': 0.0,
'Freshness': 0.0,
'Packaging': 0.0},
{'Name': 'Phil',
'Year': 2021,
'Score': '8.5',
'2nd Score': 8.8,
'% of People': '87%',
'Country': 'Spain',
'Fruit': 'The Banana',
'Export Countries': 'United Kingdom, Germany',
'Language': 'English,German,French,Italian',
'Transit Duration': 118.0,
'Quality': 1.0,
'Taste': 0.0,
'Freshness': 0.0,
'Packaging': 0.0},
{'Name': 'Sarah',
'Year': 2021,
'Score': '9',
'2nd Score': 8.8,
'% of People': '89%',
'Country': 'Greece',
'Fruit': 'The Plum',
'Export Countries': 'Germany,United States',
'Language': 'English,German,French,Italian',
'Transit Duration': 165.0,
'Quality': 1.0,
'Taste': 0.0,
'Freshness': 0.0,
'Packaging': 0.0},
{'Name': 'William',
'Year': 2021,
'Score': '6',
'2nd Score': 8.8,
'% of People': '65%',
'Country': 'Brazil',
'Fruit': 'Strawberries',
'Export Countries': 'Spain,Greece',
'Language': 'English,Spanish,French',
'Transit Duration': 153.0,
'Quality': 1.0,
'Taste': 0.0,
'Freshness': 0.0,
'Packaging': 0.0},
或者,简单地说,这个:
Name | Year | Score | 2nd Score | % of People | Country | Fruit | Export Countries | Language | Transit Duration | Quality | Taste | Freshness | Packaging
Andes, The | 2021 | 8 | 8.8 | 87% | The Netherlands | The Apple | United States,United Kingdom | English,Japanese,French | 148.0 | 1.0 | 0.0 | 0.0 | 0.0
Phil | 2021 | 8.5 | 8.4 | 87% | Spain | The Banana | United Kingdom, Germany | English,German,French,Italian | 165.0 | 1.0 | 0.0 | 0.0 | 0.0
Sarah | 2021 | 9 | 8.3 | 89% | Greece | The Plum | Germany,United States | English,German,French,Italian | 153.0 | 1.0 | 0.0 | 0.0 | 0.0
William | 2021 | 6 | 8.8 | 65% | Brazil | Strawberries | Spain,Greece | English,Spanish,French | 153.0 | 1.0 | 0.0 | 0.0 | 0.0
现在,在前一篇文章中,我被帮助将语言分离出来,然后在应用平均值时对它们进行相应的分组:
(df[['Score', 'Language']]
.assign(Language=lambda x: x.Language.str.split(','))
.explode('Language')
.groupby('Language')
.Score.mean()
.reset_index())
哪个吐出来了:
Language Score
0 English 8.333333
1 French 8.333333
2 German 8.500000
3 Italian 8.500000
4 Japanese 8.000000
然后,我尝试以另一种方式使用此逻辑,即Name列,但只选择每种语言的Top x行。为了以防万一,我不希望每种语言都包含在一个语法中。我的目标是为每种语言一次运行一个。
因此,对于英语,它将根据Score
从高到低的排序来选择Top x名称。预期的输出是这样的:
Language Top X
0 English Phil
1 English Sarah
2 English Andes, The
3 English William
我相信,如果标准没有与','
连接,我可以使用head(x)
和.sort_values(ascending=False)
的组合,所以我遇到的问题是需要.assign(Language=lambda x: x.Language.str.split(','))
,我认为这是必需的(但很高兴错了!)
有人能帮我吗?
干杯
我认为在第一步中,explode
有必要用,
拆分值,在下一步中,用两列排序,也可以用Language
用DataFrame.sort_values
用GroupBy.head
:排序
df1 = df.assign(Language=lambda x: x.Language.str.split(',')).explode('Language')
那么您的解决方案可能会简化:
df0 = df1.groupby('Language', as_index=False).Score.mean()
另一种解决方案使用:
N = 5
df2 = (df1.sort_values(['Language', 'Score'], ascending=False)
.set_index('Language')
.groupby('Language')['Name']
.head(N)
.reset_index(name=f'Top {N}'))
print (df2)
Language Top 5
0 Spanish William
1 Japanese Andes, The
2 Italian Sarah
3 Italian Phil
4 German Sarah
5 German Phil
6 French Sarah
7 French Phil
8 French Andes, The
9 French William
10 English Sarah
11 English Phil
12 English Andes, The
13 English William