如何在匹配现有列值和值列表的基础上简洁地创建一个新的数据帧列



我想通过将现有列的值与预定义的值列表相匹配,在dataframe中创建一个新列。下面我有两种方法。两人都跑了,但没有给我想要的。我更喜欢第一种方法而不是第二种方法,但不确定两者的错误在哪里。我希望解决方案简洁明了,而不必写出大量的np.where语句。

第一种方法:

words = [['one man ran','two men ran','three men ran'],['red balloons','white shirt','blue dress']]
df3 = pd.DataFrame(words, columns = ['col1','col2','col3'])
search_words1 = ['three','blue']
def columns(search_words1):
for i in search_words1:
return "".join(np.where((df3['col3'].str.contains(i)), i, ""))


df3['col4'] = df3['col3'].apply(lambda x: columns(x))
df3

不完整结果:


col1    col2    col3    col4
0   one man ran two men ran three men ran   t
1   red balloons    white shirt blue dress  b

第二种方法:

search_words1 = ['three','blue']
def my_comments(search_words1):
return "".join([i for i in search_words1 if any(i in x for x in df3['col3'])])

df3['col4'] = df3['col3'].apply(lambda x: my_comments(x))
df3

不完整结果:


col1    col2    col3    col4
0   one man ran two men ran three men ran   three men ran
1   red balloons    white shirt blue dress  blue dress

两种方法的期望输出:

col1    col2    col3    col4
0   one man ran two men ran three men ran   three
1   red balloons    white shirt blue dress  blue

使用str.extract:创建搜索词的正则表达式模式,并尝试提取匹配的模式:

pattern = fr"b({'|'.join(search_words1)})b"
df3['col4'] = df3['col3'].str.extract(pattern)

图案:

>>> print(pattern)
b(man|red)b

b匹配空字符串,但仅在单词的开头或结尾。( )是捕获组。

最新更新