我正在尝试从NLTK停止字列表中添加和删除字:
from nltk.corpus import stopwords
stop_words = set(stopwords.words('french'))
#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = set(stop_words.extend(new_stopwords))
#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'}
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])
print(final_stop_words)
输出:
Traceback (most recent call last):
File "test_stop.py", line 10, in <module>
new_stopwords_list = set(stop_words.extend(new_stopwords))
AttributeError: 'set' object has no attribute 'extend'
试试这个:
from nltk.corpus import stopwords
stop_words = set(stopwords.words('french'))
#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = stop_words.union(new_stopwords)
#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'}
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])
print(final_stop_words)
您可以使用update
而不是extend
,并以此方式替换此行new_stopwords_list = set(stop_words.extend(new_stopwords))
:
stop_words.update(new_stopwords)
new_stopwords_list = set(stop_words)
顺便说一句,如果你用包含单词list
的名称来调用set
,可能会让人感到困惑
用list(set(...))
代替set(...)
,因为只有列表有一个名为extend
:的方法
...
stop_words = list(set(stopwords.words('french')))
...