如何连接/合并2个DataFrames,保持第二个数据帧的值



我有两个带有用户详细信息和分数的DataFrames。一些用户有第二个分数,并且将出现在第二个DataFrame中。我想做的是将它们连接或合并在一起(不介意是哪一个(,以获得它们的最终分数,即如果它们存在于第二个DataFrame中,则从那里获取,否则从第一个开始。

DataFrame一:

first_name  last_name              email  score               feedback
0         Bill      First  user1@example.com   57.0  Needs to try again
1         Anna     Second  user2@example.com   10.0  Not enough to proceed
2         Fred      Third  user3@example.com  100.0  Excellent!

数据帧二:

first_name  last_name              email  score      feedback
0         Bill      First  user1@example.com  100.0  Much better!

期望结果:

first_name  last_name              email  score               feedback
0         Bill      First  user1@example.com  100.0  Much better!
1         Anna     Second  user2@example.com   10.0  Not enough to proceed
2         Fred      Third  user3@example.com  100.0  Excellent!

我得到的最接近的是df1.merge(df2, on=['email', 'first_name', 'last_name', ], how='left'),但这给了我列:first_name last_name email score_x feedback_x score_y feedback_y

使用pd.concatdrop_duplicates:

out = pd.concat([df2, df1]).drop_duplicates(['first_name', 'last_name', 'email'])
print(out)
# Output
first_name last_name              email  score               feedback
0       Bill     First  user1@example.com  100.0           Much better!
1       Anna    Second  user2@example.com   10.0  Not enough to proceed
2       Fred     Third  user3@example.com  100.0             Excellent!

假设first_namelast_nameemail字段是标识类似merge的记录的主键。

就在drop_duplicates之前,您的数据帧看起来像:

>>> pd.concat([df2, df1])
first_name last_name              email  score               feedback
0       Bill     First  user1@example.com  100.0           Much better!
0       Bill     First  user1@example.com   57.0     Needs to try again  # dupe
1       Anna    Second  user2@example.com   10.0  Not enough to proceed
2       Fred     Third  user3@example.com  100.0             Excellent!

使用combine_first:的替代方案

cols = ['email', 'first_name', 'last_name']
out = df2.set_index(cols).combine_first(df1.set_index(cols)).reset_index()

输出:

email first_name last_name  score               feedback
0  user1@example.com       Bill     First  100.0           Much better!
1  user2@example.com       Anna    Second   10.0  Not enough to proceed
2  user3@example.com       Fred     Third  100.0             Excellent!

相关内容

最新更新