在不同的分类器之间获得相同的准确性-sklearn



我有一个540的训练集和150个图像像素的测试集。这些值存储在不同的csv文件中,如下所示:

[label],[num0],[num1],...,[num399]

标签是一个单一的字母表,400个数字是像素值。这套是用于手语识别的。

代码-

import numpy as np 
import os
import csv
from sklearn import svm
from sklearn import cross_validation
from sklearn import linear_model
path = '/home/goel/skin'

X_train=[]
y_train=[]
X_test=[]
y_test=[]
ylist=[]
with open("20_20_centered_newer.csv",'r') as file:
    reader = csv.reader(file,delimiter=',')
    reader.next()
    for row in file:
        y_train.append(row[0])
        if row[0] not in ylist:
            ylist.append(row[0])        
        row=row[2:]
        row=[int(x) for x in row.split(',')]
        X_train.append(np.array(row))
y2list=[]
with open("20x20_test.csv",'r') as file:
    reader = csv.reader(file,delimiter=',')
    for row in file:
        y_test.append(row[0])
        if row[0] not in y2list:
            y2list.append(row[0])       
        row=row[2:]
        row=[int(x) for x in row.split(',')]
        X_test.append(np.array(row))
print ylist
print y2list
#clf = linear_model.SGDClassifier().fit(X_train,y_train)
#clf = svm.SVC(kernel='linear').fit(X_train,y_train)
#clf = svm.LinearSVC().fit(X_train,y_train)
clf = linear_model.LogisticRegression().fit(X_train,y_train)
print clf.score(X_test,y_test)

显然,我在所有分类器中都得到了相同的分数.78,小数点后12位!!!

这里有我不知道的语义错误吗?

可能是因为我刚开始的课太少了。我用10个班重复了这个实验,在5次交叉验证后,得到了大约2%的差异。

最新更新