如何在Sklearn中使用Selectfromostel来找到一堂课的积极信息

我想我明白，直到最近，人们都使用属性COEF_从Python的Machine Learning库Sklearn中从线性模型中提取最有用的功能。现在，用户可以指向Selectfrombelel。Selectfrombelel允许根据阈值减少功能。因此，类似以下代码将功能降低到具有重要性> 0.5的功能。我现在的问题是：有什么办法可以确定一个功能是对班级的实证还是负面歧视？

我将我的数据在一个名为Data的PANDAS DataFrame中，第一列文本文件的文件名列表，第二列标签。

count_vect = CountVectorizer(input="filename", analyzer="word")
X_train_counts = count_vect.fit_transform(data["filenames"])
print(X_train_counts.shape)
tf_transformer = TfidfTransformer(use_idf=True)
traindata = tf_transformer.fit_transform(X_train_counts)
print(traindata.shape) #report size of the training data
clf = LogisticRegression()
model = SelectFromModel(clf, threshold=0.5)
X_transform = model.fit_transform(traindata, data["labels"])
print("reduced features: ", X_transform.shape)
#get the names of all features
words = np.array(count_vect.get_feature_names())
#get the names of the important features using the boolean index from model 
print(words[model.get_support()])

据我所知，您需要坚持使用.coef_方法，然后查看哪些系数为负或阳性。负系数显然会降低该类别发生的几率（如此负相关），而正系数增加了类别发生的几率（如此积极的关系）。

但是，此方法不会给您意义，而只会给您带来方向。您将需要Selectfromostel方法来提取该方法。

相关内容

最新更新

热门标签：