这是我的代码:
#Importing the dataset
dataset = pd.read_csv('insurance.csv')
X = dataset.iloc[:, :-2].values
X = pd.DataFrame(X)
#Encoding Categorical data
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:, 1:2] = labelencoder_X.fit_transform(X[:, 1:2])
示例数据集
age sex bmi children smoker region charges
19 female 27.9 0 yes southwest 16884.924
18 male 33.77 1 no southeast 1725.5523
28 male 33 3 no southeast 4449.462
33 male 22.705 0 no northwest 21984.47061
32 male 28.88 0 no northwest 3866.8552
31 female 25.74 0 no southeast 3756.6216
46 female 33.44 1 no southeast 8240.5896
37 female 27.74 3 no northwest 7281.5056
37 male 29.83 2 no northeast 6406.4107
60 female 25.84 0 no northwest 28923.13692
运行标签编码器时,我收到以下错误
文件 "E:\Anaconda2\lib\site-packages\pandas\core\generic.py",行 1840, 在 _get_item_cache res = cache.get(item( 类型错误: 不可散列 类型
可能导致此错误的原因是什么?
这是一个小演示:
In [36]: from sklearn.preprocessing import LabelEncoder
In [37]: le = LabelEncoder()
In [38]: X = df.apply(lambda c: c if np.issubdtype(df.dtypes.loc[c.name], np.number)
else le.fit_transform(c))
In [39]: X
Out[39]:
age sex bmi children smoker region charges
0 19 0 27.900 0 1 3 16884.92400
1 18 1 33.770 1 0 2 1725.55230
2 28 1 33.000 3 0 2 4449.46200
3 33 1 22.705 0 0 1 21984.47061
4 32 1 28.880 0 0 1 3866.85520
5 31 0 25.740 0 0 2 3756.62160
6 46 0 33.440 1 0 2 8240.58960
7 37 0 27.740 3 0 1 7281.50560
8 37 1 29.830 2 0 0 6406.41070
9 60 0 25.840 0 0 1 28923.13692
来源自由度:
In [35]: df
Out[35]:
age sex bmi children smoker region charges
0 19 female 27.900 0 yes southwest 16884.92400
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520
5 31 female 25.740 0 no southeast 3756.62160
6 46 female 33.440 1 no southeast 8240.58960
7 37 female 27.740 3 no northwest 7281.50560
8 37 male 29.830 2 no northeast 6406.41070
9 60 female 25.840 0 no northwest 28923.13692
您的问题是您正在尝试标记切片编码。
重现错误的步骤:
df = pd.DataFrame({"score":[0,1],"gender":["male","female"]})
enc = LabelEncoder()
enc.fit_transform(df[:,1:2])
...
TypeError: unhashable type: 'slice'
相反,请尝试正确访问您的列,以便您馈送LabelEncoder
类似数组的形状类型(n_samples,(:numpy array、list、pandas 系列(请参阅文档(。
证明:
enc.fit_transform(df["gender"])
array([1, 0])
最后,如果你想 改变你的df
,可以按照以下几行进行操作:
for col in df.select_dtypes(include="object").columns:
df[col] = enc.fit_transform(df[col])