将Pandas数据帧与多个引用列相结合



我正在尝试组合两个Panda DataFrames,以根据第二个数据帧的标准更新第一个数据帧。以下是两个数据帧的示例:df1

year                                                                            
2016  CALIFORNIA          CLINTON, HILLARY         
2016  CALIFORNIA          TRUMP, DONALD J.         
2016  CALIFORNIA          JOHNSON, GARY          
2016  CALIFORNIA          STEIN, JILL          
2016  CALIFORNIA          WRITE-IN          
2016  CALIFORNIA          LA RIVA, GLORIA ESTELLA     
2016  TEXAS               TRUMP, DONALD J.    
2016  TEXAS               CLINTON, HILLARY    
2016  TEXAS               JOHNSON, GARY   
2016  TEXAS               STEIN, JILL         
...
state             candidate  
year                                                                                                                                            
1988  CALIFORNIA        BUSH, GEORGE H.W.            
1988  CALIFORNIA        DUKAKIS, MICHAEL       
1988  CALIFORNIA        PAUL, RONALD ""RON""          
1988  CALIFORNIA        FULANI, LENORA  
1988  TEXAS             BUSH, GEORGE H.W.    
1988  TEXAS             DUKAKIS, MICHAEL   
1988  TEXAS             PAUL, RONALD ""RON""   
1988  TEXAS             FULANI, LENORA  

df2

year                                                                            
1988  CALIFORNIA             47
1988  TEXAS                  29
...  
2016  CALIFORNIA             55
2016  TEXAS                  38

从2020年到1972年的每个选举年都有价值观,包括所有候选人和所有州的类似形式。df1中还有其他列,但它们与我要做的事情无关

我的预期结果是:

year                                                                            
2016  CALIFORNIA          CLINTON, HILLARY         55
2016  CALIFORNIA          TRUMP, DONALD J.         55
2016  CALIFORNIA          JOHNSON, GARY            55
2016  CALIFORNIA          STEIN, JILL              55 
2016  CALIFORNIA          WRITE-IN                 55
2016  CALIFORNIA       LA RIVA, GLORIA ESTELLA     55 
2016  TEXAS              TRUMP, DONALD J.          38
2016  TEXAS              CLINTON, HILLARY          38
2016  TEXAS              JOHNSON, GARY             38
2016  TEXAS              STEIN, JILL               38
...
state             candidate  
year                                                                                                                                            
1988  CALIFORNIA     BUSH, GEORGE H.W.             47
1988  CALIFORNIA      DUKAKIS, MICHAEL             47
1988  CALIFORNIA  PAUL, RONALD ""RON""             47
1988  CALIFORNIA        FULANI, LENORA             47
1988  TEXAS     BUSH, GEORGE H.W.                  29
1988  TEXAS      DUKAKIS, MICHAEL                  29
1988  TEXAS  PAUL, RONALD ""RON""                  29
1988  TEXAS        FULANI, LENORA                  29

我想将df2中的electoral_votes列与df1中的year和state列进行匹配,这样它就可以得到正确的值。我得到了一些帮助,能够在只有一列匹配的情况下进行匹配(你可以在这里看到问题和答案(,但我很难将其与两个参考点(年份和州(进行匹配。如果我使用链接的代码,它会返回错误:

pandas.errors.InvalidIndexError:重新索引仅对唯一值的Index对象有效

我试过应用、映射、应用映射、合并等,但都没能弄清楚。提前感谢您的帮助!

我相信您正在寻找的是left_merge。您应该在on=[....]中指定合并应该基于的公共列。

# Imports
import pandas as pd
# Specify two columns in the "on".
pd.merge(df1,
df2,
how='left',
on=['year','state'])
Out[1821]:

year       state                candidate  votes
0   2016  CALIFORNIA         CLINTON, HILLARY     55
1   2016  CALIFORNIA         TRUMP, DONALD J.     55
2   2016  CALIFORNIA            JOHNSON, GARY     55
3   2016  CALIFORNIA              STEIN, JILL     55
4   2016  CALIFORNIA                 WRITE-IN     55
5   2016  CALIFORNIA  LA RIVA, GLORIA ESTELLA     55
6   2016       TEXAS         TRUMP, DONALD J.     38
7   2016       TEXAS         CLINTON, HILLARY     38
8   2016       TEXAS            JOHNSON, GARY     38
9   2016       TEXAS              STEIN, JILL     38
10  1988  CALIFORNIA        BUSH, GEORGE H.W.     47
11  1988  CALIFORNIA         DUKAKIS, MICHAEL     47
12  1988  CALIFORNIA     PAUL, RONALD ""RON""     47
13  1988  CALIFORNIA           FULANI, LENORA     47
14  1988       TEXAS        BUSH, GEORGE H.W.     29
15  1988       TEXAS         DUKAKIS, MICHAEL     29
16  1988       TEXAS     PAUL, RONALD ""RON""     29
17  1988       TEXAS           FULANI, LENORA     29

上述代码可以写成:

pd.merge(df1,
df2,
how='left',
left_on=['year','state'],
right_on=['year','state'])

但是由于两个dfs中的列是相同的,我们可以使用on = ['year', 'state']

另一种写入方式-

merged_df = df1.merge(df2, on=['year', 'state'], how='left')

如果您只想使用df1-中的3列

df1 = pd.read_csv('<name_of_the_CSV_file>', usecols=['year', 'state', 'candidate'])

最新更新