Pandas通过两列中的子字符串进行筛选



我有两个数据帧,如下所示:

数据帧df1:

Rep
0   ec21b_AI_154OH
1   m2010_AI_066UW
2   20wh1_DS_416FC

数据帧df2:

Address     FirstPart   SecondPart
0   address13   m2010       066UW
1   address22   2020e       999GV
2   address26   2020c       513DT
3   address35   evd18       874GO
4   address36   ep21b       986CG
5   address493  20wh1       416FC
6   address628  ec21b       154OH

我想使用panda在数据帧df1中添加一个Address列,这样它看起来如下:

Rep             Address
0   ec21b_AI_154OH  address628
1   m2010_AI_066UW  address13
2   20wh1_DS_416FC  address493

对于df2中的每一行,我都可以在df1中搜索匹配项,并放置地址。然而,有更好的方法吗?

我的最低工作示例如下:

for First, Second, Add in zip(list(df2['FirstPart']),list(df2['SecondPart']),list(df2['Address'])):
condition = df1['Rep'].str.contains(First) & df1['Rep'].str.contains(Second)
df1.loc[condition,"Address"] = Add

注意:不需要找到_作为分隔符,也可以有其他分隔符。

IIUC,您可以将extract这两部分用于merge(此处带有命名的捕获组,但如果使用left_on/right_on参数或merge,则是可选的(:

out = df1.join(
df1['Rep']
.str.extract(r'^(?P<FirstPart>[^_]+).*?(?P<SecondPart>[^_]+)$')
.merge(df2, how='left')['Address']
)

输出:

Rep     Address
0  ec21b_AI_154OH  address628
1  m2010_AI_066UW   address13
2  20wh1_DS_416FC  address493

您可以循环遍历这两列:

这些是你的dfs,在第一个dfs上添加新的列Address:

import pandas as pd
df_dict_1 = {
'Rep':['ec21b_AI_154OH', 'm2010_AI_066UW', '20wh1_DS_416FC'],

'Address': [None, None, None]


}
df_dict_2 = {
'Address':['address13', 'address22', 'address26', 'address35', 'address36',
'address493', 'address628' ],

'FirstPart':['m2010', '2020e', '2020c', 'evd18', 'ep21b',
'20wh1', 'ec21b' ],

'SecondPart':['066UW', '999GV', '513DT', '874GO', '986CG',
'416FC', '154OH' ],

}


df1 = pd.DataFrame(df_dict_1)
df2 = pd.DataFrame(df_dict_2)

循环击败了第二个:

rep_list = list(df1.Rep)

for x in range(df2.shape[0]):

first_part = df2.FirstPart.iloc[x]
second_part = df2.SecondPart.iloc[x] 
rep = ''.join([x for x in rep_list if first_part in x and  second_part in x])
adress = df2.Address.iloc[x]
for x in range(len(df1.Rep)):
if rep ==df1.Rep[x]:
df1.Address.iloc[x] = adress

输出:

Rep             Address
0   ec21b_AI_154OH  address628
1   m2010_AI_066UW  address13
2   20wh1_DS_416FC  address493

最新更新