sklearn管道内的逆标量变换



我正在尝试应用标准化,然后使用KNN进行imputation。然后我想对这些值进行反向转换,因为我将应用一些需要原始数据的其他转换。在scikit-learn管道中有可能做到这一点吗?无论我怎么尝试,我得到一个错误。

注意:逆变换应在管道内进行,而不是在管道完成后进行。


import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, OneHotEncoder, FunctionTransformer
from sklearn.impute import KNNImputer
from sklearn.pipeline import Pipeline 
from sklearn.compose import ColumnTransformer
ss = StandardScaler()
imputer = KNNImputer(n_neighbors=3, add_indicator=False)
ohe = OneHotEncoder()
df_example = pd.DataFrame(data={"num1":[1, 2, 3, np.nan, 6, 6, 9, 4, 5], 
"num2":[4, np.nan, 6, 5, 3, 8, 2, 8, 3], 
"cat1":['A', 'B', 'C', 'A', 'B', 'C', 'A', 'A', 'B']})
list_numeric_vars = ["num1", "num2"]
list_cat_vars = ["cat1"]
pipeline_num = Pipeline([    
("standardizer", ss),
("imputer", imputer),
("standardizer_inverse", FunctionTransformer(ss.inverse_transform))
])
pipeline_cat = Pipeline([    
("ohe", ohe),
])

ct = ColumnTransformer(
transformers = 
[
("pipeline_num", pipeline_num, list_numeric_vars),
("pipeline_cat", pipeline_cat, list_cat_vars)

], 
remainder ="drop"
)
ct.fit(df_example) # Error

由于标准标度器和KNN输入器(n个最近邻的平均值)是线性操作,因此运行standardizer >> imputer >> inverse_standardizer产生的结果与单独运行imputer相同。

你可以简化你的数字管道如下:

pipeline_num = Pipeline([
("imputer", imputer),
# Add other processing steps here
])

这是"proof"单独的输入操作产生相同的结果:

df1 = ss.fit_transform(df_example[list_numeric_vars])
df1 = imputer.fit_transform(df1)
df1 = ss.inverse_transform(df1)
print(f'Scale/Impute/Inverse-Scale:n{df1}n')
df2 = imputer.fit_transform(df_example[list_numeric_vars])
print(f'Impute Only:n{df2}n')

输出如下:

Scale/Impute/Inverse-Scale:
[[1.         4.        ]
[2.         6.        ]
[3.         6.        ]
[3.33333333 5.        ]
[6.         3.        ]
[6.         8.        ]
[9.         2.        ]
[4.         8.        ]
[5.         3.        ]]
Impute Only:
[[1.         4.        ]
[2.         6.        ]
[3.         6.        ]
[3.33333333 5.        ]
[6.         3.        ]
[6.         8.        ]
[9.         2.        ]
[4.         8.        ]
[5.         3.        ]]

相关内容

  • 没有找到相关文章

最新更新