属性错误:尝试拟合 2D 数组时'list'对象没有属性'lower'错误



我试图训练一个模型来检测假新闻,并试图制作一个单词袋模型。然而,当我试图适应我的模型时,我得到了这个错误:

Traceback (most recent call last):
File "/Users/amanpuranik/PycharmProjects/covid/fake news 2.py", line 89, in <module>
headline_bow.fit(lower)
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1186, in fit
self.fit_transform(raw_documents)
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1220, in fit_transform
self.fixed_vocabulary_)
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1131, in _count_vocab
for feature in analyze(doc):
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 103, in _analyze
doc = preprocessor(doc)
File "/Users/amanpuranik/PycharmProjects/covid/venv/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 68, in _preprocess
doc = doc.lower()
AttributeError: 'list' object has no attribute 'lower'

我不知道我为什么会犯这个错误。这是我试图拟合的数据集:

[['four', 'way', 'bob', 'corker', 'skewer', 'donald', 'trump'], ['linklat', "'s", 'war', 'veteran', 'comedi', 'speak', 'modern', 'america', ',', 'say', 'star'], ['trump', '’', 'fight', 'with', 'corker', 'jeopard', 'his', 'legisl', 'agenda']]

这是我的其余代码:

data = pd.read_csv("/Users/amanpuranik/Desktop/fake-news-detection/data.csv")
data = data[['Headline', "Label"]]
x = np.array(data['Headline'])
print(x[0])
y = np.array(data["Label"])
# tokenization of the data here'
headline_vector = []
for  headline in x:
headline_vector.append(word_tokenize(headline))
print(headline_vector)

stopwords = set(stopwords.words('english'))
#removing stopwords at this part
filtered = [[word for word in sentence if word not in stopwords]
for sentence in headline_vector]
#print(filtered)

stemmed2 = [[stem(word) for word in headline] for headline in filtered]
#print(stemmed2)
#lowercase
lower = [[word.lower() for word in headline] for headline in stemmed2] #start here
#organising
articles = []

for headline in lower:
articles.append(headline)
#creating the bag of words model
headline_bow = CountVectorizer()
headline_bow.fit(lower)
a = headline_bow.transform(lower)

为什么会发生这种情况,我该怎么办才能解决?

我猜。

CountVectorizer的文档显示它必须是字符串列表(一维列表(,但你有单词列表(二维列表(

您可能需要将每个单词列表转换为字符串(句子(

lower_sentences = [" ".join(x) for x in lower]

然后你就可以使用

headline_bow = CountVectorizer()
headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences)

工作示例

from sklearn.feature_extraction.text import CountVectorizer
lower = [['four', 'way', 'bob', 'corker', 'skewer', 'donald', 'trump'], ['linklat', "'s", 'war', 'veteran', 'comedi', 'speak', 'modern', 'america', ',', 'say', 'star'], ['trump', '’', 'fight', 'with', 'corker', 'jeopard', 'his', 'legisl', 'agenda']]
lower_sentences = [' '.join(words) for words in lower]
headline_bow = CountVectorizer()
headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences)

最新更新