ValueError:当在sklearn中为pipeline使用ColumnTransformer时,给定的列不是数据框



你好,我正在学习管道的概念。我读了一个csv文件https://www.kaggle.com/zhangjuefei/birds-bones-and-living-habits,想申请流水线进行预处理和分类。

我一直在参考sklearn的管道官方文档。这是我在google colab中使用的代码。

import pandas as pd
data1 = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/data/bird.csv')
from sklearn.compose import ColumnTransformer
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
x = data1.iloc[:,1:11]
y = data1.iloc[:,11:12]
numeric_features = ['huml','humw','ulnal','ulnaw','feml','femw','tibl','tibw','tarl','tarw']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())])
categorical_features = ['type']
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
pipeline_lr = Pipeline(steps=[
('preprocessor', preprocessor),
('LRClassifier',LogisticRegression(random_state=0))
]
)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,random_state=0)
if 'type' in y_train:
print('Present') 
pipeline_lr.fit(x_train, y_train)

ValueError: 'type'不在list

ValueError:给定的列不是数据框的列

谁能给建议如何纠正这一点?

首次导入ColumnTransformermake_column_selector

from sklearn.compose import ColumnTransformer, make_column_selector

然后执行以下代码:

preprocessing = ColumnTransformer(transformers=[
('numerical', StandardScaler(),
make_column_selector(dtype_include=np.number))], remainder='passthrough')
pipe = Pipeline([('preprocess', preprocessing)])

最新更新