我想生成一个数据帧,其中包含一个人可能喜欢的颜色的蜡笔颜色列表。我有两个包含必要信息的数据帧:
df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])
我想通过将 df1 颜色条目与 df2 颜色条目匹配,并将相应的possible_crayons值作为列表返回 df1 中的新列,从而将一个数据库与另一个数据库进行引用。任何未找到匹配项的字词都将被标记为 N/A。因此,所需的输出将是:
person favorite_color possible_crayons_list
Jeff blue [navy, aqua]
Marie purple [periwinkle, royal purple]
Jenna brown NaN
Mike green [forest green, pink]
我试过:
mergedDF = pd.merge(df1, df2, how='left')
但是,这会导致以下结果:
person color possible_crayons
0 Jeff blue navy
1 Jeff blue aqua
2 Marie purple periwinkle
3 Marie purple royal purple
4 Jenna brown NaN
5 Mike green forest green
6 Mike green pine
有什么方法可以实现我想要的列表输出吗?
我们可以将DataFrame.merge
与how='left'
一起使用,然后与as_index=False
一起使用GroupBy.agg
:
new_df= ( df1.merge(df2,how='left',on='color')
.groupby(['color','person'],as_index=False).agg(list) )
输出
print(new_df)
color person possible_crayons
0 blue Jeff [navy, aqua]
1 brown Jenna [nan]
2 green Mike [forest green, pine]
3 purple Marie [periwinkle, royal purple]
使用这个:
df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])
tmp = df2.groupby('color')['possible_crayons'].apply(list)
mergedDF = df1.merge(tmp, how='left', left_on='color', right_index=True)
print(mergedDF)
mergedDF2 = mergedDF.groupby('color')['possible_crayons'].apply(list).reset_index(name='new_possible_crayons')