对帖子杂乱无章表示歉意;这是我的第一篇帖子。
背景:在每个州,对于每个民意调查机构(每个州有多个民意调查(,我想选择得票率最高的候选人:
import pandas as pd
data = {'State': ['Texas','Texas','Texas','Texas',
'New York','New York',
'Pennsylvania','Pennsylvania','Pennsylvania',
'Pennsylvania','Pennsylvania','Pennsylvania'],
'Pollster': ['Chuck Norris','Chuck Norris','Mike Jones','Mike Jones',
'Sterling Cooper','Sterling Cooper',
'Yinz','Yinz','Yinz','Wawa','Wawa','Wawa'],
'Party': ['Thems','RIPs','Thems','RIPs',
'Thems','RIPs',
'Thems','RIPs','LIBOR',
'Thems','RIPs','LIBOR'],
'Percentage of Vote' : [0.45, 0.55, 0.43, 0.57,
.99,.01,
.5,.5,0,
1/3,1/3,1/3]}
df = pd.DataFrame(data)
问题是,在宾夕法尼亚州,Yinz的民意调查中出现了双向平局,Wawa的民意调查则出现了三方平局。我如何才能选出每组中得票率最高的候选人(在给定州内进行民意调查(,如果票数相等,我可以选出多个候选人?这是原始数据:
State Pollster Party Percentage of Vote
0 Texas Chuck Norris Thems 0.450000
1 Texas Chuck Norris RIPs 0.550000
2 Texas Mike Jones Thems 0.430000
3 Texas Mike Jones RIPs 0.570000
4 New York Sterling Cooper Thems 0.990000
5 New York Sterling Cooper RIPs 0.010000
6 Pennsylvania Yinz Thems 0.500000
7 Pennsylvania Yinz RIPs 0.500000
8 Pennsylvania Yinz LIBOR 0.000000
9 Pennsylvania Wawa Thems 0.333333
10 Pennsylvania Wawa RIPs 0.333333
11 Pennsylvania Wawa LIBOR 0.333333
以下是所需的输出:
State Pollster Party Percentage of Vote
1 Texas Chuck Norris RIPs 0.550000
3 Texas Mike Jones RIPs 0.570000
4 New York Sterling Cooper Thems 0.990000
6 Pennsylvania Yinz Thems 0.500000
7 Pennsylvania Yinz RIPs 0.500000
9 Pennsylvania Wawa Thems 0.333333
10 Pennsylvania Wawa RIPs 0.333333
11 Pennsylvania Wawa LIBOR 0.333333
请注意每次投票中排名第一的候选人是如何保持的,只有在票数相等的情况下,才会显示多个候选人参加投票。
我尝试过使用:
df.groupby(['State', 'Pollster'])
在一个州内按民意调查分组,但我不知道下一步该怎么办。
谢谢!
您当然必须执行groupby&得到"0"的最大值的索引;投票百分比;。。。这将过滤掉期望的结果。遵循以下代码:
idx = df.groupby(['State','Pollster'])['Percentage of Vote'].transform(max) == df['Percentage of Vote']
df1 = df[idx]
# output of df1;
State Pollster Party Percentage of Vote
1 Texas Chuck Norris RIPs 0.550000
3 Texas Mike Jones RIPs 0.570000
4 New York Sterling Cooper Thems 0.990000
6 Pennsylvania Yinz Thems 0.500000
7 Pennsylvania Yinz RIPs 0.500000
9 Pennsylvania Wawa Thems 0.333333
10 Pennsylvania Wawa RIPs 0.333333
11 Pennsylvania Wawa LIBOR 0.333333