为什么我的系数只有 1 维

我正在尝试对一些评论数据进行感性分析。响应变量为"正"或"负"。我运行了我的模型，我的系数只有 1 维，我相信它应该是 2，因为有两个响应变量。任何帮助都值得赞赏，以找出原因。

from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import BernoulliNB
from sklearn import cross_validation
from sklearn.metrics import classification_report
import numpy as np
from sklearn.metrics import accuracy_score
import textblob as TextBlob

#scikit
comments = list(['happy','sad','this is negative','this is positive', 'i like this', 'why do i hate this'])
classes = list(['positive','negative','negative','positive','positive','negative'])

# preprocess creates the term frequency matrix for the review data set
stop = stopwords.words('english')
count_vectorizer = CountVectorizer(analyzer =u'word',stop_words = stop, ngram_range=(1, 3))
comments = count_vectorizer.fit_transform(comments)
tfidf_comments = TfidfTransformer(use_idf=True).fit_transform(comments)

# preparing data for split validation. 60% training, 40% test
data_train,data_test,target_train,target_test = cross_validation.train_test_split(tfidf_comments,classes,test_size=0.2,random_state=43)
classifier = BernoulliNB().fit(data_train,target_train)
classifier.coef_.shape

最后一行打印出来（1L、6L）。我试图找出负面和正面的信息特征，但由于它的 1L，它将为两种响应提供相同的信息。

谢谢！

在scikit learn预处理模块的源代码中，LabelBinarizer类实现了多标签分类的一对一方案。您可以在其中看到，如果只存在两个类，它会学习一组系数，用于预测样本是否属于类"1"，如果不是，分类器预测"0"。

相关内容

最新更新

热门标签：