我有两个从 excel 读取的数据框。
df1
SRD CIVF Test Case
0 9530n3678n549 CIV-016
1 9979n9980 CIV-040
2 5231n4455 CIV-177
df2
SRD SRD CR
0 549n9980 CR181
1 4455 CR170
2 5231n9979 CR190
对于 df1,我想添加第三列,该列将指示引用与 SRD 列中列出的相同 SRD 的"SRD CR"编号。不确定我应该使用"地图"还是"添加"熊猫功能。df3 中显示的数据帧基本上是我正在寻找的。DF3 将写回 excel 中。另外,我希望在同一单元格(对于Excel(中保留多个值会很棘手。
df3
SRD CIVF Test Case SRD CR
0 9530n3678n549 CIV-016 CR181
1 9979n9980 CIV-040 CR190nCR181
2 5231n4455 CIV-177 CR170nCR190
除非有充分的理由将SRD
值隐藏在单行的字符串中,否则我会转换df1
和df2
,以便每一行都有一个SRD
值。然后你可以在SRD
上合并:
# Split all strings between 'n' into their own columns
split1 = df1['SRD'].str.split('\\n', expand=True)
split2 = df2['SRD'].str.split('\\n', expand=True)
split1
0 1 2
0 9530 3678 549
1 9979 9980 None
2 5231 4455 None
split2
0 1
0 549 9980
1 4455 None
2 5231 9979
# Concatenate the above split columns onto the right sides of
# the original DFs
catted1 = pd.concat([df1, split1], axis=1)
catted2 = pd.concat([df2, split2], axis=1)
catted1
SRD CIVF Test Case 0 1 2
0 9530n3678n549 NaN CIV-016 9530 3678 549
1 9979n9980 NaN CIV-040 9979 9980 None
2 5231n4455 NaN CIV-177 5231 4455 None
catted2
SRD SRD CR 0 1
0 549n9980 CR181 549 9980
1 4455 CR170 4455 None
2 5231n9979 CR190 5231 9979
# Give each SRD its own row
melted1 = pd.melt(catted1,
id_vars=['CIVF', 'Test Case', 'SRD'],
value_name='Shared_SRDs')
.drop('variable', axis=1).dropna(subset=['Shared_SRDs'])
melted2 = pd.melt(catted2.drop('SRD', axis=1),
id_vars=['SRD CR'],
value_name='Shared_SRDs')
.drop('variable', axis=1).dropna()
melted1
CIVF Test Case SRD
0 NaN CIV-016 9530
1 NaN CIV-040 9979
2 NaN CIV-177 5231
3 NaN CIV-016 3678
4 NaN CIV-040 9980
5 NaN CIV-177 4455
6 NaN CIV-016 549
melted2
SRD CR Shared_SRDs
0 CR181 549
1 CR170 4455
2 CR190 5231
3 CR181 9980
5 CR190 9979
# Merge on SRD values
merged = melted1.merge(melted2, on='SRD').sort_values(['Test Case', 'SRD CR']).reset_index(drop=True)
merged
CIVF Test Case SRD Shared_SRDs SRD CR
0 NaN CIV-016 9530n3678n549 549 CR181
1 NaN CIV-040 9979n9980 9980 CR181
2 NaN CIV-040 9979n9980 9979 CR190
3 NaN CIV-177 5231n4455 4455 CR170
4 NaN CIV-177 5231n4455 5231 CR190