所以我正在尝试构建一个分类器并对其性能进行评分。这是我的代码:
def svc(train_data, train_labels, test_data, test_labels):
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
svc = SVC(kernel='linear')
svc.fit(train_data, train_labels)
predicted = svc.predict(test_data)
actual = test_labels
score = svc.score(test_data, test_labels)
print ('svc score')
print (score)
print ('svc accuracy')
print (accuracy_score(predicted, actual))
现在当我运行函数svc(X, X, Y, Y)时:
X.shape = (1000, 150)
x.shape = (1000, )
Y.shape = (200, 150)
y.shape = (200, )
我得到错误:
6 predicted = svc.predict(test_classed_data)
7 actual = test_classed_labels
----> 8 score = svc.score(test_classed_data, test_classed_labels)
9 print ('svc score')
10 print (score)
local/lib/python3.4/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
289 """
290 from .metrics import accuracy_score
--> 291 return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
292
293
124 if (y_type not in ["binary", "multiclass", "multilabel-indicator",
125 "multilabel-sequences"]):
--> 126 raise ValueError("{0} is not supported".format(y_type))
127
128 if y_type in ["binary", "multiclass"]:
ValueError: continuous is not supported
问题是我的test_labels或者y的格式是:
[ 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 20.5
20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 25.5 25.5
25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 30.5 30.5 30.5
30.5 30.5 30.5 30.5 30.5 30.5 30.5 30.5 35.5 35.5 35.5 35.5
35.5 35.5 35.5 35.5 35.5 35.5 35.5... ]
我真的很困惑,为什么SVC不承认这些是离散的标签,当我看到的所有例子都有类似的格式,我的工作很好。请帮助。
fit
和score
函数中的y
均应为整数或字符串,表示类标号。
。如果你有两个类"foo"
和1
,你可以这样训练SVM:
>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> X = np.random.randn(10, 4)
>>> y = ["foo"] * 5 + [1] * 5
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
然后用
测试其准确性>>> X_test = np.random.randn(6, 4)
>>> y_test = ["foo", 1] * 3
>>> clf.score(X_test, y_test)
0.5
浮点值显然仍然被fit
接受,但它们不应该被接受,因为类标签不应该是实值。
来自scikit-learn文档中的svm: http://scikit-learn.org/stable/modules/svm.html#classification:
"与其他分类器一样,SVC、NuSVC和LinearSVC将两个数组作为输入:一个大小为[n_samples, n_features]的数组X保存训练样本,另一个数组Y包含整数值"
将标签数组转换为int,或者如果这太简单(例如1.6和1.8将被转换为相同的值),则为每个唯一的浮点值分配一个整数类标签。
不知道为什么fit
和predict
方法不抛出错误。