我想通过将现有列的值与预定义的值列表相匹配,在dataframe
中创建一个新列。下面我有两种方法。两人都跑了,但没有给我想要的。我更喜欢第一种方法而不是第二种方法,但不确定两者的错误在哪里。我希望解决方案简洁明了,而不必写出大量的np.where
语句。
第一种方法:
words = [['one man ran','two men ran','three men ran'],['red balloons','white shirt','blue dress']]
df3 = pd.DataFrame(words, columns = ['col1','col2','col3'])
search_words1 = ['three','blue']
def columns(search_words1):
for i in search_words1:
return "".join(np.where((df3['col3'].str.contains(i)), i, ""))
df3['col4'] = df3['col3'].apply(lambda x: columns(x))
df3
不完整结果:
col1 col2 col3 col4
0 one man ran two men ran three men ran t
1 red balloons white shirt blue dress b
第二种方法:
search_words1 = ['three','blue']
def my_comments(search_words1):
return "".join([i for i in search_words1 if any(i in x for x in df3['col3'])])
df3['col4'] = df3['col3'].apply(lambda x: my_comments(x))
df3
不完整结果:
col1 col2 col3 col4
0 one man ran two men ran three men ran three men ran
1 red balloons white shirt blue dress blue dress
两种方法的期望输出:
col1 col2 col3 col4
0 one man ran two men ran three men ran three
1 red balloons white shirt blue dress blue
使用str.extract
:创建搜索词的正则表达式模式,并尝试提取匹配的模式:
pattern = fr"b({'|'.join(search_words1)})b"
df3['col4'] = df3['col3'].str.extract(pattern)
图案:
>>> print(pattern)
b(man|red)b
b
匹配空字符串,但仅在单词的开头或结尾。( )
是捕获组。