通过匹配字典列表中的子字符串映射panda列



我有一个包含名称的数据帧和一个字典列表,其中包含每个名称的名称和计数。我需要根据results中每个名称的存在情况创建一个新列。但问题不是完全匹配,而是仅基于名字的部分匹配。到目前为止,我尝试的所有解决方案都非常笨拙,所以我跳到这个很棒的社区可能会建议一些更优雅的东西。

dic = {"IDs": ['a','b','c','d','e','f','g','k','l','m'],
"names": ['Ailbhe Yowa',
'Hannah Kirst',
'Morris Hunt',
'Flavia Quor in the UK',
'Sarah Smith and Alexandra Libman',
'Flavia Morris, Mark Torre, Ann Moor',
'Rowena Freez',
'Adam Lion in USA',
'Mahmood Jade  in Europe',
'Morris Tool and  Francois Lopin']

}
test = pd.DataFrame(dic)
results = [[{'name': 'Ailbhe', 'count': 17}],
[{'name': 'Mahmood', 'count': 2818}],
[{'name': 'Debbie', 'count': 11493}],
[{'name': 'Arthur', 'count': 20587}],
[{'name': 'Clive', 'count': 2703}],
[{'name': 'Flavia', 'count': 10166}],
[{'name': 'Alexandra', 'count': 1939}],
[{'name': 'Sarah', 'count': 88388}],
[{'name': 'Morris', 'count': 3194}],
[{'name': 'Cameron', 'count': 3334}]]

所需的输出应该如下所示:

IDs names                               results
0   a   Ailbhe Yowa                         [{'name': 'Ailbhe', 'count': 17}]
1   b   Hannah Kirst    
2   c   Morris Hunt                         [{'name': 'Morris', 'count': 3194}]
3   d   Flavia Quor in the UK               [{'name': 'Flavia', 'count': 10166}]
4   e   Sarah Smith and Alexandra Libman    [{'name': 'Sarah', 'count': 88388}, {'name': 'Alexandra', 'count': 1939}]
5   f   Flavia Morris, Mark Torre, Ann Moor [{'name': 'Flavia', 'count': 10166}]
6   g   Rowena Freez    
7   k   Adam Lion in USA    
8   l   Mahmood Jade in Europe              [{'name': 'Mahmood', 'count': 2818}]
9   m   Morris Tool and Francois Lopin      [{'name': 'Morris', 'count': 3194}]

单向使用pandas.Series.str.findall:

name_dict = {l[0]["name"]: l[0] for l in results}
reg = "(%s)" % "|".join(list(name_dict))
test["results"] = test["names"].str.findall(reg).apply(lambda x: [name_dict[i] for i in x])
print(test)

输出:

IDs                                names  
0   a                          Ailbhe Yowa   
1   b                         Hannah Kirst   
2   c                          Morris Hunt   
3   d                Flavia Quor in the UK   
4   e     Sarah Smith and Alexandra Libman   
5   f  Flavia Morris, Mark Torre, Ann Moor   
6   g                         Rowena Freez   
7   k                     Adam Lion in USA   
8   l              Mahmood Jade  in Europe   
9   m      Morris Tool and  Francois Lopin   
results  
0                  [{'name': 'Ailbhe', 'count': 17}]  
1                                                 []  
2                [{'name': 'Morris', 'count': 3194}]  
3               [{'name': 'Flavia', 'count': 10166}]  
4  [{'name': 'Sarah', 'count': 88388}, {'name': '...  
5  [{'name': 'Flavia', 'count': 10166}, {'name': ...  
6                                                 []  
7                                                 []  
8               [{'name': 'Mahmood', 'count': 2818}]  
9                [{'name': 'Morris', 'count': 3194}]  
resultsToAdd = []
for index, row in test.iterrows():
for result_row in results:
isFound = False
for result in result_row:
if result['name'] in row['names']:
isFound = True
break
if isFound:
break
if isFound:
resultsToAdd.append(result_row)
else :
resultsToAdd.append(" ")
test["results"] = resultsToAdd        
print(test)

最新更新