我有什么
我有一个列'学生'学生的名字和他们的个性。我有一个名为"品质"的列表,其中包含过滤目的所需的品质。
我想要的
我想要一个列旁边的'学生',从列表中返回匹配的字符串。
我有什么
import pandas as pd
Personality = {'Student':["Aysha is clever", "Ben is stronger", "Cathy is clever and strong", "Dany is intelligent", "Ella is naughty", "Fred is quieter"]}
index_labels=['1','2','3','4','5','6']
df = pd.DataFrame(Personality,index=index_labels)
qualities = ['calm', 'clever', 'quiet', 'bold', 'strong', 'cute']
我想要的
输出使用str.findall然后按,
分割
df['ex'] = df['Student'].str.findall('|'.join(qualities)).apply(set).str.join(', ')
new = df["ex"].str.split(pat = ",", expand=True)[1]
df =pd.concat([df, new], axis = 1)
df = df.fillna('')
print(df)
给了#
Student ex 1
1 Aysha is clever clever
2 Ben is stronger strong
3 Cathy is clever and strong clever, strong strong
4 Dany is intelligent
5 Ella is naughty
6 Fred is quieter quiet
您可以使用str.extractall
和unstack
,然后join
到原始DataFrame:
import re
pattern = '|'.join(map(re.escape, qualities))
out = df.join(df['Student'].str.extractall(f'({pattern})')[0].unstack())
输出:
Student 0 1
1 Aysha is clever clever NaN
2 Ben is stronger strong NaN
3 Cathy is clever and strong clever strong
4 Dany is intelligent NaN NaN
5 Ella is naughty NaN NaN
6 Fred is quieter quiet NaN