我有一个包含名称的数据帧和一个字典列表,其中包含每个名称的名称和计数。我需要根据results
中每个名称的存在情况创建一个新列。但问题不是完全匹配,而是仅基于名字的部分匹配。到目前为止,我尝试的所有解决方案都非常笨拙,所以我跳到这个很棒的社区可能会建议一些更优雅的东西。
dic = {"IDs": ['a','b','c','d','e','f','g','k','l','m'],
"names": ['Ailbhe Yowa',
'Hannah Kirst',
'Morris Hunt',
'Flavia Quor in the UK',
'Sarah Smith and Alexandra Libman',
'Flavia Morris, Mark Torre, Ann Moor',
'Rowena Freez',
'Adam Lion in USA',
'Mahmood Jade in Europe',
'Morris Tool and Francois Lopin']
}
test = pd.DataFrame(dic)
results = [[{'name': 'Ailbhe', 'count': 17}],
[{'name': 'Mahmood', 'count': 2818}],
[{'name': 'Debbie', 'count': 11493}],
[{'name': 'Arthur', 'count': 20587}],
[{'name': 'Clive', 'count': 2703}],
[{'name': 'Flavia', 'count': 10166}],
[{'name': 'Alexandra', 'count': 1939}],
[{'name': 'Sarah', 'count': 88388}],
[{'name': 'Morris', 'count': 3194}],
[{'name': 'Cameron', 'count': 3334}]]
所需的输出应该如下所示:
IDs names results
0 a Ailbhe Yowa [{'name': 'Ailbhe', 'count': 17}]
1 b Hannah Kirst
2 c Morris Hunt [{'name': 'Morris', 'count': 3194}]
3 d Flavia Quor in the UK [{'name': 'Flavia', 'count': 10166}]
4 e Sarah Smith and Alexandra Libman [{'name': 'Sarah', 'count': 88388}, {'name': 'Alexandra', 'count': 1939}]
5 f Flavia Morris, Mark Torre, Ann Moor [{'name': 'Flavia', 'count': 10166}]
6 g Rowena Freez
7 k Adam Lion in USA
8 l Mahmood Jade in Europe [{'name': 'Mahmood', 'count': 2818}]
9 m Morris Tool and Francois Lopin [{'name': 'Morris', 'count': 3194}]
单向使用pandas.Series.str.findall
:
name_dict = {l[0]["name"]: l[0] for l in results}
reg = "(%s)" % "|".join(list(name_dict))
test["results"] = test["names"].str.findall(reg).apply(lambda x: [name_dict[i] for i in x])
print(test)
输出:
IDs names
0 a Ailbhe Yowa
1 b Hannah Kirst
2 c Morris Hunt
3 d Flavia Quor in the UK
4 e Sarah Smith and Alexandra Libman
5 f Flavia Morris, Mark Torre, Ann Moor
6 g Rowena Freez
7 k Adam Lion in USA
8 l Mahmood Jade in Europe
9 m Morris Tool and Francois Lopin
results
0 [{'name': 'Ailbhe', 'count': 17}]
1 []
2 [{'name': 'Morris', 'count': 3194}]
3 [{'name': 'Flavia', 'count': 10166}]
4 [{'name': 'Sarah', 'count': 88388}, {'name': '...
5 [{'name': 'Flavia', 'count': 10166}, {'name': ...
6 []
7 []
8 [{'name': 'Mahmood', 'count': 2818}]
9 [{'name': 'Morris', 'count': 3194}]
resultsToAdd = []
for index, row in test.iterrows():
for result_row in results:
isFound = False
for result in result_row:
if result['name'] in row['names']:
isFound = True
break
if isFound:
break
if isFound:
resultsToAdd.append(result_row)
else :
resultsToAdd.append(" ")
test["results"] = resultsToAdd
print(test)