Sklearn 无法将字符串转换为浮点数



我使用Sklearn作为机器学习工具,但每次我运行我的代码,它给出这个错误:

Traceback (most recent call last):
File "C:UsersFakeUserMadeUpDesktopPythonMachine LearningMachineLearning.py", line 12, in <module>
model.fit(X_train, Y_train)
File "C:UsersFakeUserMadeUpAppDataRoamingPythonPython37site-packagessklearntree_classes.py", line 942, in fit
X_idx_sorted=X_idx_sorted,
File "C:UsersFakeUserMadeUpAppDataRoamingPythonPython37site-packagessklearntree_classes.py", line 166, in fit
X, y, validate_separately=(check_X_params, check_y_params)
File "C:UsersFakeUserMadeUpAppDataRoamingPythonPython37site-packagessklearnbase.py", line 578, in _validate_data
X = check_array(X, **check_X_params)
File "C:UsersFakeUserMadeUpAppDataRoamingPythonPython37site-packagessklearnutilsvalidation.py", line 746, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "C:UsersFakeUserMadeUpAppDataRoamingPythonPython37site-packagespandascoregeneric.py", line 1993, in __ array __
return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: 'Paris'

下面是代码,下面是我的数据集:

(我已经尝试了多个不同的数据集,也,这个数据集是一个txt,因为我自己做的,我哑巴转换成csv。)

import pandas as pd
from sklearn.tree import DecisionTreeClassifier as dtc
from sklearn.model_selection import train_test_split as tts
city_data = pd.read_csv('TimeZoneTable.txt')
X = city_data.drop(columns=['Country'])
Y = city_data['Country']
X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2)
model = dtc()
model.fit(X_train, Y_train)
predictions = model.predict(X_test)
print(Y_test)
print(predictions)

数据集:

CityName,Country,Latitude,Longitude,TimeZone
Moscow,Russia,55.45'N,37.37'E,3
Vienna,Austria,48.13'N,16.22'E,2
Barcelona,Spain,41.23'N,2.11'E,2
Madrid,Spain,40.25'N,3.42'W,2
Lisbon,Portugal,38.44'N,9.09'W,1
London,UK,51.30'N,0.08'W,1
Cardiff,UK,51.29'N,3.11'W,1
Edinburgh,UK,55.57'N,3.11'W,1
Dublin,Ireland,53.21'N,6.16'W,1
Paris,France,48.51'N,2.21'E,2

机器学习算法,特别是随机森林,只对输入数字有效。如果你想改进你的模型,甚至建议在-1和-1之间规范化你的模型,因此使用十进制数,因此期望一个浮点数。

在您的情况下,您的数据框架似乎只包含字符串项。正如Dilara Gokay所说,你首先需要将字符串转换为浮点数,要做到这一点,需要使用所谓的onehotencoder。如果你不知道怎么做,我让你按照这个教程来做。

相关内容

  • 没有找到相关文章

最新更新