随机森林评分方法

我试图找到给定数据集相对于一些训练数据的分数。我写了下面的代码:

from sklearn.ensemble import RandomForestClassifier
import numpy as np
randomForest = RandomForestClassifier(n_estimators = 200)
li_train1 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_train2 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_text1 = [[10,20,30,40,50,60,70,80,90], [10,20,30,40,50,60,70,80,90]]
li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
randomForest.fit(li_train1, li_train2)
output =  randomForest.score(li_train1, li_text1)

在编译并试图运行程序时，我得到了错误:

Traceback (most recent call last):
  File "trial.py", line 16, in <module>
    output =  randomForest.score(li_train1, li_text1)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 89, in _check_targets
    raise ValueError("{0} is not supported".format(y_type))
ValueError: multiclass-multioutput is not supported

在检查与评分方法相关的文档时，它说:

score(X, y, sample_weight=None)
X : array-like, shape = (n_samples, n_features)
    Test samples.
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
    True labels for X.

X和y在这里都是数组，2d数组

我也做了这个问题，但我不明白我错在哪里。

编辑

因此，根据答案和随后的评论，我将程序编辑如下:

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
import numpy as np
randomForest = RandomForestClassifier(n_estimators = 200)
mlb = MultiLabelBinarizer()
li_train1 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_train2 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_text1 = [100,200]
li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
randomForest.fit(li_train1, li_train2)
output =  randomForest.score(li_train1, li_text1)

编辑后，我得到错误:

Traceback (most recent call last):
  File "trial.py", line 19, in <module>
    output =  randomForest.score(li_train1, li_text1)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 82, in _check_targets
    "".format(type_true, type_pred))
ValueError: Can't handle mix of binary and multiclass-multioutput

根据文档:

警告:目前，在sklearn中没有度量。Metrics支持多输出-多类分类任务。

score方法调用了sklearn的准确度度量，但对于您定义的多类、多输出分类问题，不支持此方法。

从你的问题看不出你是否真的打算解决一个多类，多输出的问题。如果这不是您的目的，那么您应该重构您的输入数组。

另一方面，如果你真的想解决这类问题，你只需要定义你自己的评分函数。

既然你没有解决一个多类，多标签的问题，你应该重组你的数据，使它看起来像这样:

from sklearn.ensemble import RandomForestClassifier
# training data
X =  [
    [1,2,3,4,5,6,7,8,9],
    [1,2,3,4,5,6,7,8,9]
]
y =  [0,1]
# fit the model
randomForest.fit(X,y)
# test data
Xtest =  [
    [1,2,0,4,5,6,0,8,9],
    [1,1,3,1,5,0,7,8,9]
]
ytest =  [0,1]
output =  randomForest.score(Xtest,ytest)
print(output) # 0.5

相关内容

最新更新

热门标签：