从句子中查找并删除一个单词(在单词匹配之间)python

我有如下句子

mainsentence="My words aren't available give didn't give apple and did happening me"
stopwords=['are','did','word', 'able','give','happen']

如果任何单词与介于两者之间的单词匹配，则想要删除(例如："单词"应与"单词"匹配并删除它，"did"应匹配"没有"并删除它，"able"应删除"可用"，因为"able"单词在"可用"中

finalsentence="My apple and me"

尝试使用以下代码，但

querywords = mainsentence.split()
resultwords  = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)

但它仅适用于完全匹配。

请帮助我。

您可以执行以下操作：

>>> ' '.join([word for word in mainsentence.split() if not any([stopword in word for stopword in stopwords])])
'My apple and me'

编辑：这不需要是双向检查，只需查看单词是否包含停用词
编辑2：使用更新的问题参数更新结果

不区分大小写的版本：

' '.join([word for word in mainsentence.split() if not any([stopword.lower() in word.lower() for stopword in stopwords])])

以下代码将满足问题中所述的要求，但结果不太可能是您想要的。代码的一般基础结构应该是正确的，但您可能希望更改部分匹配的条件 (stopword in testword(：

def filter_out_stopwords(text, stopwords):
result = []
for word in text.split():
testword = word.lower()
flag = True
for stopword in stopwords:
if stopword in testword:
flag = False
break
if flag:
result.append(word)
return result

' '.join(filter_out_stopwords("My words aren't available give didn't give apple and did happening me", ['are', 'did', 'word', 'able', 'give', 'happen']))
# "My apple and me"

或者，使用列表推导和all()(any()可以等效使用(：

def filter_out_stopwords(text, stopwords):                                                                                                   
return [
word for word in text.split()
if all(stopword not in word.lower() for stopword in stopwords)]

' '.join(filter_out_stopwords("My words aren't available give didn't give apple and did happening me", ['are', 'did', 'word', 'able', 'give', 'happen']))
# "My apple and me"

你可以使用正则表达式的力量来解决这类问题。

import re

你可以得到所有的数学单词，如：

words = re.findall(r'[a-z]*did[a-z]*', mainsentence)

您也可以替换它们：

re.sub(r'[a-z]*able[a-z]* ', '', mainsentence)

所以最终答案：

mainsentence="My words aren't available give didn't give apple and did happening me"
stopwords=['are','did','word', 'able','give','happen']
for word in stopwords:
mainsentence = re.sub(fr'[a-z']*{word}[a-z']* ', '', mainsentence)
# My apple and me

您遇到的问题可以通过以下步骤获得可持续的解决方案。

展开单词，例如我有 -> 我有，没有 ->没有。研究收缩。
使用单词的引理来获取每个单词的基本形式，即将单词的形式更改为其根形式。示例：播放，播放，播放成为播放。我们将语料库的当前状态称为干净的语料库。研究词形还原。
现在从干净的语料库中删除任何停用词。

您可能还会发现我编写的文本清理模块很有趣，其中还包括拼写更正，可用于制作文本清理管道。

相关内容

最新更新

热门标签：