按列表过滤字符串列而不进行完全匹配



我有一个熊猫数据框,如下所示:-

Tweets
0   RT @cizzorz: THE CHILLER TRAP *TEMPLE RUN* OBS...
1   Disco Domination receives a change in order to...
2   It's time for the Week 3 #FallSkirmish Trials!...
3   Dance your way to victory in the new Disco Dom...
4   Patch v6.02 is available now with a return fro...
5   Downtime for patch v6.02 has begun. Find out a...
6   💀⛏️... soon
7   Launch into patch v6.02 Wednesday, October 10!...
8   Righteous Fury.nnThe Wukong and Dark Vanguar...
9   RT @wbgames: WB Games is happy to bring @Fortn...

我还有一个列表,假设如下:-

my_list = ['Launch', 'Dance', 'Issue']

使用以下命令过滤掉数据帧:-

ndata = data[data['Tweets'].str.contains( "|".join(my_list), regex=True)].reset_index(drop=True)

如果我有过滤器不起作用

Working        Not Working
Launch        'launch' , 'launch,' , 'Launch,' ,'LAUNCH','@launch'

预期输出应为具有以下任何单词的句子

'launch' , 'launch,' , 'Launch,' ,'LAUNCH','@launch'

您需要确保contains忽略了以下情况:

import re
.
.
.
ndata = data[data['Tweets'].str.contains("|".join(my_list), regex=True,                          
flags=re.IGNORECASE)].reset_index(drop=True)
#                                        ^^^^^^^^^^^^^^^^^^^

最新更新