sklearn的SVC评分方法需要什么样的输入?



所以我正在尝试构建一个分类器并对其性能进行评分。这是我的代码:

def svc(train_data, train_labels, test_data, test_labels):
    from sklearn.svm import SVC
    from sklearn.metrics import accuracy_score
    svc = SVC(kernel='linear')
    svc.fit(train_data, train_labels)
    predicted = svc.predict(test_data)
    actual = test_labels
    score = svc.score(test_data, test_labels)
    print ('svc score')
    print (score)
    print ('svc accuracy')
    print (accuracy_score(predicted, actual))

现在当我运行函数svc(X, X, Y, Y)时:

X.shape = (1000, 150)    
x.shape = (1000, )   
Y.shape = (200, 150)   
y.shape = (200, )

我得到错误:

      6     predicted = svc.predict(test_classed_data)
      7     actual = test_classed_labels
----> 8     score = svc.score(test_classed_data, test_classed_labels)
      9     print ('svc score')
     10     print (score)
local/lib/python3.4/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
    289         """
    290         from .metrics import accuracy_score
--> 291         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
    292 
    293 
    124     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
    125                        "multilabel-sequences"]):
--> 126         raise ValueError("{0} is not supported".format(y_type))
    127 
    128     if y_type in ["binary", "multiclass"]:
ValueError: continuous is not supported

问题是我的test_labels或者y的格式是:

[ 15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  20.5
  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  25.5  25.5
  25.5  25.5  25.5  25.5  25.5  25.5  25.5  25.5  25.5  30.5  30.5  30.5
  30.5  30.5  30.5  30.5  30.5  30.5  30.5  30.5  35.5  35.5  35.5  35.5
  35.5  35.5  35.5  35.5  35.5  35.5  35.5... ]

我真的很困惑,为什么SVC不承认这些是离散的标签,当我看到的所有例子都有类似的格式,我的工作很好。请帮助。

fitscore函数中的y均应为整数或字符串,表示类标号。

。如果你有两个类"foo"1,你可以这样训练SVM:

>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> X = np.random.randn(10, 4)
>>> y = ["foo"] * 5 + [1] * 5
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

然后用

测试其准确性
>>> X_test = np.random.randn(6, 4)
>>> y_test = ["foo", 1] * 3
>>> clf.score(X_test, y_test)
0.5

浮点值显然仍然被fit接受,但它们不应该被接受,因为类标签不应该是实值。

来自scikit-learn文档中的svm: http://scikit-learn.org/stable/modules/svm.html#classification:

"与其他分类器一样,SVC、NuSVC和LinearSVC将两个数组作为输入:一个大小为[n_samples, n_features]的数组X保存训练样本,另一个数组Y包含整数值"

将标签数组转换为int,或者如果这太简单(例如1.6和1.8将被转换为相同的值),则为每个唯一的浮点值分配一个整数类标签。

不知道为什么fitpredict方法不抛出错误。

相关内容

  • 没有找到相关文章