[属性错误:"元组"对象没有属性"较低"]


train = [('I love this sandwich.','pos'),
     ('This is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('This is my best work.', 'pos'),
     ('What an awesome view', 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this.", 'neg'),
     ('He is my sworn enemy!.', 'neg'),
     ('My boss is horrible.', 'neg')
    ]

这是我的训练数据,我正在使用 CountVectorizer,但它向我显示错误

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
text_train_cv = cv.fit_transform(train)
text_train_cv.shape

请帮帮我。

您不正确地使用了cv.fit_transform方法。根据文档,此方法接受字符串的可迭代对象。您正在为该方法提供元组列表。

以下是文档的摘录:

fit_transform(raw_documents, y=无( 参数:raw_documents:可迭代

生成 str、unicode 或文件对象的可迭代对象。

一种方法可以做到这一点:

text_train_cv = cv.fit_transform(list(zip(*train))[0])

最新更新