Sklearn尝试将字符串列表转换为浮点数

我正在尝试使sklearn.svm.SVC(kernel="linear")算法起作用。我的 X 是一个用 [misc.imread(each).flatten() for each in filenames] 组成的数组，我的 y2 是由字符串（如 ["A","1","4","F"..] ）组成的列表的一部分。

当我尝试clf.fit(X,y2)时，sklearn尝试将我的字符串列表转换为浮点数并失败，抛出ValueError: could not convert string to float。我该如何解决这个问题？

编辑：将sklearn升级到0.15解决了这个问题。

scikit-learn中有一个帮助类可以很好地实现这一点，它被称为sklearn.preprocessing.LabelEncoder：

from sklearn.preprocessing import LabelEncoder
y2 = ["A","1","4","F","A","1","4","F"]
lb = LabelEncoder()
y = lb.fit_transform(y2)
# y is now: array([2, 0, 1, 3, 2, 0, 1, 3])

为了返回到原始标签（例如，在使用SVC对看不见的数据进行分类后），请使用LabelEncoder inverse_transform来恢复字符串标签：

lb.inverse_transform(y)
# => array(['A', '1', '4', 'F', 'A', '1', '4', 'F'], dtype='|S1')

您需要为每个唯一的字符串标签分配一个唯一的整数。我假设您的y2变量包含每个类的多个实例。

所以也许它看起来更像：

y2 = ["A","1","4","F","A","1","4","F"]

现在你可以做这样的事情：

S = set(y2) # collect unique label names
D = dict( zip(S, range(len(S))) ) # assign each string an integer, and put it in a dict
Y = [D[y2_] for y2_ in y2] # store class labels as ints

对于上述y2，这将产生：

>>> print Y
[0, 1, 2, 3, 0, 1, 2, 3]

相关内容

最新更新

热门标签：