合并按数据帧的列值分组的两个数据帧



我有以下数据帧:

df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home","Stay home", 'Go outside', "Go Outside","Go outside"],
'Child' : ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "sunny"]})
Parent      Child
0   Stay home   Severe weather
1   Stay home   Severe weather
2   Stay home   Severe weather
3   Go outside  Sunny
4   Go Outside  Sunny
5   Go outside  sunny

第二个:

df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})
Similarity_Score
0   SimilarityScore:0.43693185876069784
1   SimilarityScore:0.299807821163373

我想根据df1的Child值合并两个数据帧。

预期结果:

Parent     Child           Similarity_Score
0   Stay home   Severe weather  0.43693185876069784
1   Stay home   Severe weather  0.43693185876069784
2   Stay home   Severe weather  0.43693185876069784
3   Go outside  Sunny           0.299807821163373
4   Go Outside  Sunny           0.299807821163373
5   Go outside  sunny           0.299807821163373

我尝试了常用的mergeconcat方法,但没有找到解决方案。有什么想法吗?

如果你想根据Child的值分配分数,你可以这样做:

import numpy as np
import pandas as pd
df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home", "Stay home", 'Go outside', "Go Outside", "Go outside"],
'Child': ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "Sunny"]})
df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})
# Split the string at : and convert to float
df2['Similarity_Score'] = df2['Similarity_Score'].str.split(':').str[1].astype(float)
# calculate auxiliary column position to base the matching on
df1['position'] = df1['Child'].apply(lambda row: np.where(df1['Child'].unique() == row)[0][0])
# merge both dataframes and drop auxiliary column position
df = df1.merge(df2, left_on='position', right_index=True).drop(columns=["position"])

根据您的响应,合并基于索引,在df1 中获取唯一值后


# identifying the group
df1['key']=df1.groupby(['Parent','Child']).ngroup(ascending=False)
df1
# merge the two DF, and while merging split the similarity scope to take only numeric part
(df1.merge(df2['Similarity_Score'].str.split(':', expand=True)[1],
left_on='key', 
right_index=True)
.drop(columns='key'))
Parent  Child   1
0   Stay home   Severe weather  0.43693185876069784
1   Stay home   Severe weather  0.43693185876069784
2   Stay home   Severe weather  0.43693185876069784
3   Go outside  Sunny   0.299807821163373
4   Go outside  Sunny   0.299807821163373
5   Go outside  Sunny   0.299807821163373

最新更新