我在python中写了这个自定义变压器。目的是在管道类中使用它来测序数据预处理步骤。我的数据集具有9个数值,第10列是分类的。
from sklearn.base import BaseEstimator, TransformerMixin
class DataFrameSelector(BaseEstimator, TransformerMixin):
def _init_(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
在我尝试运行此代码时定义此类课程后,我在下面列出了错误
fyi .... datasets_num是一个包含数值列/属性的数据框。
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_attributes = list(datasets_num)
cat_attributes = ["ocean_proximity"]
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attributes)),
('imputer', Imputer(strategy = "median")),
('std_scalar', StandardScaler())
])
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attributes)),
('label_binarizer', LabelBinarizer())
])
错误:
Traceback (most recent call last):
File "<ipython-input-34-f509d02ccc6e>", line 7, in <module>
('selector', DataFrameSelector(num_attributes)),
TypeError: object() takes no parameters
在这里:
class DataFrameSelector(BaseEstimator, TransformerMixin):
def _init_(self, attribute_names):
您想要双重下划线:
def __init__(self, attribute_names):