我试图在泰坦数据集中制作一个ML模型,在准备它的时候,我使用OneHotEncoder制作了开始的假人,而这样做的时候,我失去了我的列标题。这是数据集之前的样子。
Pclass Sex Age SibSp Parch Fare Cabin Embarked
0 3 1 22.000000 1 0 7.2500 146 2
1 1 0 38.000000 1 0 71.2833 81 0
2 3 0 26.000000 0 0 7.9250 146 2
3 1 0 35.000000 1 0 53.1000 55 2
4 3 1 35.000000 0 0 8.0500 146 2
... ... ... ... ... ... ... ... ...
886 2 1 27.000000 0 0 13.0000 146 2
887 1 0 19.000000 0 0 30.0000 30 2
888 3 0 29.699118 1 2 23.4500 146 2
889 1 1 26.000000 0 0 30.0000 60 0
890 3 1 32.000000 0 0 7.7500 146 1
代码如下:
ct = ColumnTransformer([('encoder', OneHotEncoder(), [7])], remainder='passthrough')
X = pd.DataFrame(ct.fit_transform(X))
X
数据集现在的样子。
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 3.0 1.0 22.000000 1.0 7.2500 146.0
1 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 38.000000 1.0 71.2833 81.0
2 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 3.0 0.0 26.000000 0.0 7.9250 146.0
3 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 35.000000 1.0 53.1000 55.0
4 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 3.0 1.0 35.000000 0.0 8.0500 146.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 2.0 1.0 27.000000 0.0 13.0000 146.0
887 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 19.000000 0.0 30.0000 30.0
888 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 3.0 0.0 29.699118 1.0 23.4500 146.0
889 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 26.000000 0.0 30.0000 60.0
890 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 3.0 1.0 32.000000 0.0 7.7500 146.0
您可以使用ColumnTransformer
的get_feature_names
方法,前提是您的所有变压器都支持该方法,并且您已经在数据框架上进行了训练。
ct = ColumnTransformer([('encoder', OneHotEncoder(), [7])], remainder='passthrough')
X = pd.DataFrame(ct.fit_transform(X), columns=ct.get_feature_names())
X
fit_transform的输出是array likeX_t{array-like, sparse matrix} of shape (n_samples, sum_n_components)
(不是dataframelike)
因此没有标头。如果需要头文件,则必须在重新构建DataFrame时命名它们。