我试图通过使用str.contains()获得与下面代码相同的结果,但我就是无法获得相同的结果。
目标是过滤"question"列。数据框架"数据"同时包含'England'和'King'的值
def filter_data(data, words):
filter = lambda x: all(word.lower() in x.lower() for word in words)
return data.loc[data["question"].apply(filter)]
answer = filter_data(data, ['England', 'King'])
我代码:
re_filter = data[
(data.question.str.contains("(w|W)England(w|W)", regex= True, case= False))&
(data.question.str.contains("(w|W)King(w|W)", regex= True, case= False))
]
是因为错误的正则表达式吗?非常感谢所有的帮助!!
这是最简单的方法:
data[data.question.str.contains(r'(?=.*England)(?=.*King)', case=False)]
对于case=False
,它不区分大小写
您可以尝试:
df = pd.DataFrame(data={'question':['I have both England and King', 'I have just England', 'I have just King']})
print(df[df.question.str.contains('England') & (df.question.str.contains('King'))])
输出:
question
0 I have both England and King