Pyspark 训练的逻辑回归模型不预测 () 和预测概率 () 函数



我用PySpark MLlib内置类LogisticRegression训练了一个Logistic回归模型。然而,当它被训练时,由于AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'AttributeError: 'LogisticRegression' object has no attribute 'predict',它不能用于预测其他数据帧。

from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)
# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')
model.fit(df_train)
model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])
AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'

属性:

PySpark版本:

>>import pyspark
>>pyspark.__version__
3.1.2
JDK版本:

>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)

环境:Google Colab

您的代码

model.fit(df_train)

实际上并没有给你一个训练好的模型,因为变量model的类型仍然是pyspark.ml.classification.LogisticRegression

type(model)
# pyspark.ml.classification.LogisticRegression

因此,您应该通过将返回对象赋值给变量或覆盖model变量来捕获返回对象,然后它将为您提供pyspark.ml.classification.LogisticRegressionModel类的训练逻辑回归模型

model = model.fit(df_train)
type(model)
# pyspark.ml.classification.LogisticRegressionModel

最后,.predict.predictProbability方法需要一个pyspark.ml.linalg.DenseVector对象的参数。所以,我认为你想使用.transform代替,因为它将添加预测标签和概率作为列输入数据框。就像这样

predicted_df = model.transform(df_val)

最新更新