在不使用nltk的情况下删除文本文件中的停止字



大家好我想在不使用nltk的情况下删除文本文件中的停止词。我有一个文本文件有停止单词列表,我想使用上面提到的停止单词列表。谢谢

虽然很难理解确切的需求,但我会做以下事情:

with open("stopwords.txt") as f:
stopwords = f.read().splitlines() # Contains "and" and "or" on different lines
text = "Foo and bar or foo"
tokens = text.split() # Split into list of words
for word in tokens: 
if word.lower() in stopwords: # If word in stopwords remove it
tokens.remove(word)
clean_text = " ".join(word for word in tokens) # Join words into a string
print(clean_text) # Outputs: "Foo bar foo"

最新更新