因此,我试图从文本文件中删除所有停止字。问题是,它在每个单词中删除每个单词。
def remove_stopwords(input):
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in input if not word in stop_words]
return filtered_words
Sample Input: Damage from Typhoon Lando soars to P6B
Output: Dge fr Tphn Ln r P6B
在删除停止单词之前,请输入您的str
。
from nltk.corpus import stopwords
from nltk import word_tokenize
stoplist = set(stopwords.words('english'))
def remove_stopwords(text):
return [word for word in word_tokenize(text) if not word in stoplist]