我有两个数据帧,如下所示:
数据帧df1
:
Rep
0 ec21b_AI_154OH
1 m2010_AI_066UW
2 20wh1_DS_416FC
数据帧df2
:
Address FirstPart SecondPart
0 address13 m2010 066UW
1 address22 2020e 999GV
2 address26 2020c 513DT
3 address35 evd18 874GO
4 address36 ep21b 986CG
5 address493 20wh1 416FC
6 address628 ec21b 154OH
我想使用panda在数据帧df1
中添加一个Address
列,这样它看起来如下:
Rep Address
0 ec21b_AI_154OH address628
1 m2010_AI_066UW address13
2 20wh1_DS_416FC address493
对于df2
中的每一行,我都可以在df1
中搜索匹配项,并放置地址。然而,有更好的方法吗?
我的最低工作示例如下:
for First, Second, Add in zip(list(df2['FirstPart']),list(df2['SecondPart']),list(df2['Address'])):
condition = df1['Rep'].str.contains(First) & df1['Rep'].str.contains(Second)
df1.loc[condition,"Address"] = Add
注意:不需要找到_
作为分隔符,也可以有其他分隔符。
IIUC,您可以将extract
这两部分用于merge
(此处带有命名的捕获组,但如果使用left_on
/right_on
参数或merge
,则是可选的(:
out = df1.join(
df1['Rep']
.str.extract(r'^(?P<FirstPart>[^_]+).*?(?P<SecondPart>[^_]+)$')
.merge(df2, how='left')['Address']
)
输出:
Rep Address
0 ec21b_AI_154OH address628
1 m2010_AI_066UW address13
2 20wh1_DS_416FC address493
您可以循环遍历这两列:
这些是你的dfs,在第一个dfs上添加新的列Address:
import pandas as pd
df_dict_1 = {
'Rep':['ec21b_AI_154OH', 'm2010_AI_066UW', '20wh1_DS_416FC'],
'Address': [None, None, None]
}
df_dict_2 = {
'Address':['address13', 'address22', 'address26', 'address35', 'address36',
'address493', 'address628' ],
'FirstPart':['m2010', '2020e', '2020c', 'evd18', 'ep21b',
'20wh1', 'ec21b' ],
'SecondPart':['066UW', '999GV', '513DT', '874GO', '986CG',
'416FC', '154OH' ],
}
df1 = pd.DataFrame(df_dict_1)
df2 = pd.DataFrame(df_dict_2)
循环击败了第二个:
rep_list = list(df1.Rep)
for x in range(df2.shape[0]):
first_part = df2.FirstPart.iloc[x]
second_part = df2.SecondPart.iloc[x]
rep = ''.join([x for x in rep_list if first_part in x and second_part in x])
adress = df2.Address.iloc[x]
for x in range(len(df1.Rep)):
if rep ==df1.Rep[x]:
df1.Address.iloc[x] = adress
输出:
Rep Address
0 ec21b_AI_154OH address628
1 m2010_AI_066UW address13
2 20wh1_DS_416FC address493