Python Pandas 数据帧交叉引用和新列生成



我想生成一个数据帧,其中包含一个人可能喜欢的颜色的蜡笔颜色列表。我有两个包含必要信息的数据帧:

df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])

我想通过将 df1 颜色条目与 df2 颜色条目匹配,并将相应的possible_crayons值作为列表返回 df1 中的新列,从而将一个数据库与另一个数据库进行引用。任何未找到匹配项的字词都将被标记为 N/A。因此,所需的输出将是:

person favorite_color possible_crayons_list  
Jeff   blue           [navy, aqua]  
Marie  purple         [periwinkle, royal purple]  
Jenna  brown          NaN  
Mike   green          [forest green, pink]

我试过:

mergedDF = pd.merge(df1, df2, how='left')

但是,这会导致以下结果:

person   color possible_crayons  
0   Jeff    blue             navy  
1   Jeff    blue             aqua  
2  Marie  purple       periwinkle  
3  Marie  purple     royal purple  
4  Jenna   brown              NaN  
5   Mike   green     forest green  
6   Mike   green             pine  

有什么方法可以实现我想要的列表输出吗?

我们可以将DataFrame.mergehow='left'一起使用,然后与as_index=False一起使用GroupBy.agg

new_df= ( df1.merge(df2,how='left',on='color')
.groupby(['color','person'],as_index=False).agg(list) )

输出

print(new_df)
color person            possible_crayons
0    blue   Jeff                [navy, aqua]
1   brown  Jenna                       [nan]
2   green   Mike        [forest green, pine]
3  purple  Marie  [periwinkle, royal purple]

使用这个:

df1 = pd.DataFrame({'person':['Jeff','Marie','Jenna','Mike'], 'color':['blue', 'purple', 'brown', 'green']}, columns=['person','color'])
df2 = pd.DataFrame({'possible_crayons':['christmas red','infra red','scarlet','sunset orange', 'neon carrot','lemon','forest green','pine','navy','aqua','periwinkle','royal purple'],'color':['red','red','red','orange','orange','yellow','green','green','blue','blue','purple','purple']}, columns=['possible_crayons','color'])
tmp = df2.groupby('color')['possible_crayons'].apply(list)
mergedDF = df1.merge(tmp, how='left', left_on='color', right_index=True)
print(mergedDF)

mergedDF2 = mergedDF.groupby('color')['possible_crayons'].apply(list).reset_index(name='new_possible_crayons')

最新更新