我试图在Scikit-learn中可用的分类器之间实现一点比较。根据这个页面,除了svm之外,所有的分类器都应该工作。
该操作的实现方法如下:
clf['bayes'] = OneVsRestClassifier(MultinomialNB(
clf['lda'] = OneVsRestClassifier(LDA())
clf['decision tree'] = OneVsRestClassifier(DecisionTreeClassifier())
clf['rdc'] = OneVsRestClassifier(RandomForestClassifier())
y_supposes = {}
precision = {}
for classifier in clf:
clf[classifier].fit(x_train, y_train)
y_supposes[classifier] = clf[classifier].predict(x_test)
precision[classifier] = calcul_precision(y_supposes[classifier], y_test)
问题是,唯一有效的分类器是bayes
分类器。
当我尝试调用classifier['rdc'].fit(x_train, y_train)
时,另一个给我这个错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:Python27libsite-packagessklearnmulticlass.py", line 201, in fit
n_jobs=self.n_jobs)
File "C:Python27libsite-packagessklearnmulticlass.py", line 92, in fit_ov
r
for i in range(Y.shape[1]))
File "C:Python27libsite-packagessklearnexternalsjoblibparallel.py", lin
e 517, in __call__
self.dispatch(function, args, kwargs)
File "C:Python27libsite-packagessklearnexternalsjoblibparallel.py", lin
e 312, in dispatch
job = ImmediateApply(func, args, kwargs)
File "C:Python27libsite-packagessklearnexternalsjoblibparallel.py", lin
e 136, in __init__
self.results = func(*args, **kwargs)
File "C:Python27libsite-packagessklearnmulticlass.py", line 61, in _fit_b
inary
estimator.fit(X, y)
File "C:Python27libsite-packagessklearnensembleforest.py", line 257, in
fit
check_ccontiguous=True)
File "C:Python27libsite-packagessklearnutilsvalidation.py", line 220, in
check_arrays
raise TypeError('A sparse matrix was passed, but dense '
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray
() to convert to a dense numpy array.
我想补充的是,clf['rdc'].fit(x_train.toarray, y_train)
(如错误信息所示)也给了我一个错误。
你能帮我找到我跳过的步骤吗?
编辑:新进展
我认为问题可能来自x_train
的类型。我的计算方法如下:
x = [{f1 : a, ... fn : jo}, ..., {f3 : 5}]
y_train = [('label1', ), ..., ('labelZ', 'label72')]
x_train = DictVectorizer.fit_transform(x)
type(x_train) == <class 'scipy.sparse.csr.csr_matrix'>
我也尝试了这种方法:MultinomialNB.fit(np.array(x), np.array(y))
,它给了我一个新的错误信息:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:Python27libsite-packagessklearnnaive_bayes.py", line 308, in fit
X = X.astype(np.float)
TypeError: float() argument must be a string or a number
错误消息非常清楚地表明,您正在将稀疏矩阵传递给不支持稀疏矩阵的估计器。在您测试的四个分类器中,只有MultinomialNB
支持稀疏矩阵输入。对于决策树和随机森林,稀疏矩阵支持正在进行中。
至于np.array(x)
,它并不像你想象的那样。要将稀疏矩阵转换为密集数组,请使用x.toarray()
,或者将sparse=False
传递给DictVectorizer
构造函数。