如何搜索数据帧列中的字符串列表,并将匹配的字符串作为相邻列返回



我有什么

我有一个列'学生'学生的名字和他们的个性。我有一个名为"品质"的列表,其中包含过滤目的所需的品质。

我想要的

我想要一个列旁边的'学生',从列表中返回匹配的字符串。

我有什么
import pandas as pd
Personality = {'Student':["Aysha is clever", "Ben is stronger", "Cathy is clever and strong", "Dany is intelligent", "Ella is naughty", "Fred is quieter"]}
index_labels=['1','2','3','4','5','6']
df = pd.DataFrame(Personality,index=index_labels)
qualities = ['calm', 'clever', 'quiet', 'bold', 'strong', 'cute']
我想要的

输出

使用str.findall然后按,分割

df['ex'] = df['Student'].str.findall('|'.join(qualities)).apply(set).str.join(', ')
new = df["ex"].str.split(pat = ",", expand=True)[1]
df =pd.concat([df, new], axis = 1)
df = df.fillna('')
print(df)

给了#

Student              ex        1
1             Aysha is clever          clever
2             Ben is stronger          strong
3  Cathy is clever and strong  clever, strong   strong
4         Dany is intelligent
5             Ella is naughty
6             Fred is quieter           quiet

您可以使用str.extractallunstack,然后join到原始DataFrame:

import re
pattern = '|'.join(map(re.escape, qualities))
out = df.join(df['Student'].str.extractall(f'({pattern})')[0].unstack())

输出:

Student       0       1
1             Aysha is clever  clever     NaN
2             Ben is stronger  strong     NaN
3  Cathy is clever and strong  clever  strong
4         Dany is intelligent     NaN     NaN
5             Ella is naughty     NaN     NaN
6             Fred is quieter   quiet     NaN

最新更新