我正在尝试组合两个Panda DataFrames,以根据第二个数据帧的标准更新第一个数据帧。以下是两个数据帧的示例:df1
year
2016 CALIFORNIA CLINTON, HILLARY
2016 CALIFORNIA TRUMP, DONALD J.
2016 CALIFORNIA JOHNSON, GARY
2016 CALIFORNIA STEIN, JILL
2016 CALIFORNIA WRITE-IN
2016 CALIFORNIA LA RIVA, GLORIA ESTELLA
2016 TEXAS TRUMP, DONALD J.
2016 TEXAS CLINTON, HILLARY
2016 TEXAS JOHNSON, GARY
2016 TEXAS STEIN, JILL
...
state candidate
year
1988 CALIFORNIA BUSH, GEORGE H.W.
1988 CALIFORNIA DUKAKIS, MICHAEL
1988 CALIFORNIA PAUL, RONALD ""RON""
1988 CALIFORNIA FULANI, LENORA
1988 TEXAS BUSH, GEORGE H.W.
1988 TEXAS DUKAKIS, MICHAEL
1988 TEXAS PAUL, RONALD ""RON""
1988 TEXAS FULANI, LENORA
df2
year
1988 CALIFORNIA 47
1988 TEXAS 29
...
2016 CALIFORNIA 55
2016 TEXAS 38
从2020年到1972年的每个选举年都有价值观,包括所有候选人和所有州的类似形式。df1中还有其他列,但它们与我要做的事情无关
我的预期结果是:
year
2016 CALIFORNIA CLINTON, HILLARY 55
2016 CALIFORNIA TRUMP, DONALD J. 55
2016 CALIFORNIA JOHNSON, GARY 55
2016 CALIFORNIA STEIN, JILL 55
2016 CALIFORNIA WRITE-IN 55
2016 CALIFORNIA LA RIVA, GLORIA ESTELLA 55
2016 TEXAS TRUMP, DONALD J. 38
2016 TEXAS CLINTON, HILLARY 38
2016 TEXAS JOHNSON, GARY 38
2016 TEXAS STEIN, JILL 38
...
state candidate
year
1988 CALIFORNIA BUSH, GEORGE H.W. 47
1988 CALIFORNIA DUKAKIS, MICHAEL 47
1988 CALIFORNIA PAUL, RONALD ""RON"" 47
1988 CALIFORNIA FULANI, LENORA 47
1988 TEXAS BUSH, GEORGE H.W. 29
1988 TEXAS DUKAKIS, MICHAEL 29
1988 TEXAS PAUL, RONALD ""RON"" 29
1988 TEXAS FULANI, LENORA 29
我想将df2中的electoral_votes列与df1中的year和state列进行匹配,这样它就可以得到正确的值。我得到了一些帮助,能够在只有一列匹配的情况下进行匹配(你可以在这里看到问题和答案(,但我很难将其与两个参考点(年份和州(进行匹配。如果我使用链接的代码,它会返回错误:
pandas.errors.InvalidIndexError:重新索引仅对唯一值的Index对象有效
我试过应用、映射、应用映射、合并等,但都没能弄清楚。提前感谢您的帮助!
我相信您正在寻找的是left_merge
。您应该在on=[....]
中指定合并应该基于的公共列。
# Imports
import pandas as pd
# Specify two columns in the "on".
pd.merge(df1,
df2,
how='left',
on=['year','state'])
Out[1821]:
year state candidate votes
0 2016 CALIFORNIA CLINTON, HILLARY 55
1 2016 CALIFORNIA TRUMP, DONALD J. 55
2 2016 CALIFORNIA JOHNSON, GARY 55
3 2016 CALIFORNIA STEIN, JILL 55
4 2016 CALIFORNIA WRITE-IN 55
5 2016 CALIFORNIA LA RIVA, GLORIA ESTELLA 55
6 2016 TEXAS TRUMP, DONALD J. 38
7 2016 TEXAS CLINTON, HILLARY 38
8 2016 TEXAS JOHNSON, GARY 38
9 2016 TEXAS STEIN, JILL 38
10 1988 CALIFORNIA BUSH, GEORGE H.W. 47
11 1988 CALIFORNIA DUKAKIS, MICHAEL 47
12 1988 CALIFORNIA PAUL, RONALD ""RON"" 47
13 1988 CALIFORNIA FULANI, LENORA 47
14 1988 TEXAS BUSH, GEORGE H.W. 29
15 1988 TEXAS DUKAKIS, MICHAEL 29
16 1988 TEXAS PAUL, RONALD ""RON"" 29
17 1988 TEXAS FULANI, LENORA 29
上述代码可以写成:
pd.merge(df1,
df2,
how='left',
left_on=['year','state'],
right_on=['year','state'])
但是由于两个dfs中的列是相同的,我们可以使用on = ['year', 'state']
另一种写入方式-
merged_df = df1.merge(df2, on=['year', 'state'], how='left')
如果您只想使用df1-中的3列
df1 = pd.read_csv('<name_of_the_CSV_file>', usecols=['year', 'state', 'candidate'])