Sklearn2pmml 似乎不支持自定义功能转换功能?



我的管道使用自定义转换函数,无法使用sklearn2pmml成功转换。

这是我的自定义功能代码

def calc_modify_days(X):
X['modify_date_new']  = X['modify_date'].apply(lambda x:x[:4]+'-'+x[4:6]+'-'+x[6:8] if x!='' and x<'20221230' else '2022-12-30' )
X['modify_days'] = (pd.to_datetime(X['day_id']) - pd.to_datetime(X['modify_date_new'])).dt.days
X['modify_days'] = X['modify_days'].apply(lambda x:-1 if x<0 else x)

return X['modify_days']
def transform_channel_ty_cd(X):

return X.apply(lambda x: all_cate_dict['channel_type_cd_3'].get(x) if x in all_cate_dict['channel_type_cd_3'] else 0)

下面是管道代码,它适用于预测

mapper_encode = [
(['day_id','modify_date'],FunctionTransformer(calc_modify_days),{'alias':'modify_days'}),
('channel_type_cd_3',FunctionTransformer(transform_channel_ty_cd))]
mapper = DataFrameMapper(mapper_encode, input_df=True, df_out=True)
pipeline_test = PMMLPipeline(
steps=[("mapper", mapper),
("classifier", clf_1)])

但是当我试图将管道转换为pmml文件时,我得到了一个错误

Standard output is empty
Standard error:
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 61 ms.
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
INFO: Converting..
Oct 27, 2022 3:43:25 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
Oct 27, 2022 3:43:25 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Java class net.razorvine.pickle.objects.ClassDictConstructor)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:92)
at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:63)
at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:43)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
at sklearn.Initializer.encodeFeatures(Initializer.java:44)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn.Composite.encodeFeatures(Composite.java:129)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
at org.jpmml.sklearn.Main.run(Main.java:228)
at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to numpy.core.UFunc
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
... 12 more
Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Java class net.razorvine.pickle.objects.ClassDictConstructor)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:82)
at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:92)
at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:63)
at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:43)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
at sklearn.Initializer.encodeFeatures(Initializer.java:44)
at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
at sklearn.Composite.encodeFeatures(Composite.java:129)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
at org.jpmml.sklearn.Main.run(Main.java:228)
at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to numpy.core.UFunc
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
... 12 more

我试着查找它,但FunctionTransformerlambda函数似乎有问题

我应该如何解决它?

我尝试先将管道转换为pkl.z文件,然后再转换为pmml文件,但出现了类似的错误。

此外,我试图删除lambda函数,但它仍然不起作用,只要它是一个自定义特性处理程序就行。

此问题已在jpmml/sklearn2pmml#354 中得到回答

简而言之,无法pickle包含lambda函数(或引用本地函数(的FunctionTransformer实例是Python的限制。SkLearn2PMML包只是抱怨这里的管道对象不完整。

在当前的情况下,用户能够使用标准PMML构造(在sklearn2pmml.preprocessing模块中实现为transformer类(来实现其日期时间算术业务逻辑。根本不需要使用lambda函数。

最新更新