我有一个数据帧(df),包含30,000行
id Name Age
1 Joey 22
2 Anna 34
3 Jon 33
4 Amy 30
5 Kay 22
和另一个包含相同列但缺少一些id的数据框(df2)
id Name Age Sport
Jon 33 Tennis
5 Kay 22 Football
Joey 22 Basketball
4 Amy 30 Running
Anna 42 Dancing
我希望缺失的id以相应的名称
出现在df2中df2:
id Name Age Sport
3 Jon 33 Tennis
5 Kay 22 Football
1 Joey 22 Basketball
4 Amy 30 Running
2 Anna 42 Dancing
有人能帮忙吗?我是熊猫和数据框架的新手
您可以使用.map
与.fillna
df2['id'] = df2['id'].replace('',np.nan,regex=True)
.fillna(df2['Name'].map(df1.set_index('Name')['id'])).astype(int)
print(df2)
id Name Age Sport
0 3 Jon 33 Tennis
1 5 Kay 22 Football
2 1 Joey 22 Basketball
3 4 Amy 30 Running
4 2 Anna 42 Dancing
首先,用pd.merge连接两个数据框根据你的钥匙。我想在这个例子中键是'Name'和'Age'。然后使用np替换df2中的空id值。where和.isnull ()查找空值。
df3 = pd.merge(df2, df1, on=['name', 'age'], how='left')
df2['id'] = np.where(df3.id_x.isnull(), df3.id_y, df3.id_x).astype(int)
id name age sport
0 1 Joey 22 Tennis
1 2 Anna 34 Football
2 3 Jon 33 Basketball
3 4 Amy 30 Running
4 5 Kay 22 Dancing