Python-如何按DataFrame组选择最大分组数(同时允许对平局进行多次选择)



对帖子杂乱无章表示歉意;这是我的第一篇帖子。

背景:在每个州,对于每个民意调查机构(每个州有多个民意调查(,我想选择得票率最高的候选人:

import pandas as pd
data = {'State': ['Texas','Texas','Texas','Texas',
'New York','New York',
'Pennsylvania','Pennsylvania','Pennsylvania',
'Pennsylvania','Pennsylvania','Pennsylvania'],
'Pollster': ['Chuck Norris','Chuck Norris','Mike Jones','Mike Jones',
'Sterling Cooper','Sterling Cooper',
'Yinz','Yinz','Yinz','Wawa','Wawa','Wawa'],
'Party': ['Thems','RIPs','Thems','RIPs',
'Thems','RIPs',
'Thems','RIPs','LIBOR',
'Thems','RIPs','LIBOR'],
'Percentage of Vote' : [0.45, 0.55, 0.43, 0.57,
.99,.01,
.5,.5,0,
1/3,1/3,1/3]}
df = pd.DataFrame(data)

问题是,在宾夕法尼亚州,Yinz的民意调查中出现了双向平局,Wawa的民意调查则出现了三方平局。我如何才能选出每组中得票率最高的候选人(在给定州内进行民意调查(,如果票数相等,我可以选出多个候选人?这是原始数据:

State         Pollster        Party  Percentage of Vote
0 Texas         Chuck Norris    Thems  0.450000
1 Texas         Chuck Norris    RIPs   0.550000
2 Texas         Mike Jones      Thems  0.430000
3 Texas         Mike Jones      RIPs   0.570000
4 New York      Sterling Cooper Thems  0.990000
5 New York      Sterling Cooper RIPs   0.010000
6 Pennsylvania  Yinz            Thems  0.500000
7 Pennsylvania  Yinz            RIPs   0.500000
8 Pennsylvania  Yinz            LIBOR  0.000000
9 Pennsylvania  Wawa            Thems  0.333333
10 Pennsylvania  Wawa            RIPs   0.333333
11 Pennsylvania  Wawa            LIBOR  0.333333

以下是所需的输出:

State         Pollster        Party  Percentage of Vote
1 Texas         Chuck Norris    RIPs   0.550000
3 Texas         Mike Jones      RIPs   0.570000
4 New York      Sterling Cooper Thems  0.990000
6 Pennsylvania  Yinz            Thems  0.500000
7 Pennsylvania  Yinz            RIPs   0.500000
9 Pennsylvania  Wawa            Thems  0.333333
10 Pennsylvania  Wawa            RIPs   0.333333
11 Pennsylvania  Wawa            LIBOR  0.333333

请注意每次投票中排名第一的候选人是如何保持的,只有在票数相等的情况下,才会显示多个候选人参加投票。

我尝试过使用:

df.groupby(['State', 'Pollster'])

在一个州内按民意调查分组,但我不知道下一步该怎么办。

谢谢!

您当然必须执行groupby&得到"0"的最大值的索引;投票百分比;。。。这将过滤掉期望的结果。遵循以下代码:

idx = df.groupby(['State','Pollster'])['Percentage of Vote'].transform(max) == df['Percentage of Vote']
df1 = df[idx]
# output of df1;
State         Pollster  Party  Percentage of Vote
1          Texas     Chuck Norris   RIPs            0.550000
3          Texas       Mike Jones   RIPs            0.570000
4       New York  Sterling Cooper  Thems            0.990000
6   Pennsylvania             Yinz  Thems            0.500000
7   Pennsylvania             Yinz   RIPs            0.500000
9   Pennsylvania             Wawa  Thems            0.333333
10  Pennsylvania             Wawa   RIPs            0.333333
11  Pennsylvania             Wawa  LIBOR            0.333333

最新更新