尝试在Titanic数据集上应用OneHotEncoding。sklearn版本为0.19.2。Labelconded第一个,现在尝试Onehot编码时,它抛出错误"无法将str转换为float:C148">
首先,对"性"one_answers"尴尬"功能进行了标签编码,这是成功的。现在,当尝试一个热编码时,"客舱"功能中的值出现了异常,而该值根本不应该被编码。此外,C148是几乎出现在数据集末尾的值。
#Label Encoding
encoder= LabelEncoder()
df2['Embarked']=df2['Embarked'].fillna(method='backfill')
array1= df2.values
array1[:,4]=encoder.fit_transform(array1[:,4])
array1[:,11]=encoder.fit_transform(array1[:,11])
df_encoded1= pd.DataFrame(array1)
#One hot encoding
from sklearn.preprocessing import OneHotEncoder
hotencoder= OneHotEncoder(categorical_features=[4,11])
array1= hotencoder.fit_transform(array1)
ValueError Traceback (most recent call last)
<ipython-input-45-c14deb702f63> in <module>()
----> 1 array1= hotencoder.transform(array1)
~AppDataLocalContinuumanaconda3libsite-
packagessklearnpreprocessingdata.py in transform(self, X)
2073 """
2074 return _transform_selected(X, self._transform,
-> 2075 self.categorical_features,
copy=True)
2076
2077
~AppDataLocalContinuumanaconda3libsite-
packagessklearnpreprocessingdata.py in _transform_selected(X, transform,
selected, copy)
1807 X : array or sparse matrix, shape=(n_samples, n_features_new)
1808 """
-> 1809 X = check_array(X, accept_sparse='csc', copy=copy,
dtype=FLOAT_DTYPES)
1810
1811 if isinstance(selected, six.string_types) and selected == "all":
~AppDataLocalContinuumanaconda3libsite-
packagessklearnutilsvalidation.py in check_array(array, accept_sparse,
dtype, order, copy, force_all_finite, ensure_2d, allow_nd,
ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order,
copy=copy)
434
435 if ensure_2d:
ValueError: could not convert string to float: 'C148'
除了解决上述错误外,还请告诉我如何更新到最新版本的sklearn。我尝试使用pipinstall-uscikit-learn更新sklearn,但它再次安装了0.19.2版本。
您使用的scikit包似乎是旧包。实际上,您可以将其升级到0.20.x。否则,您可以首先通过Labelender()方法使其工作。