我用PySpark MLlib内置类LogisticRegression
训练了一个Logistic回归模型。然而,当它被训练时,由于AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
或AttributeError: 'LogisticRegression' object has no attribute 'predict'
,它不能用于预测其他数据帧。
from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)
# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')
model.fit(df_train)
model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])
AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
属性:
PySpark版本:
>>import pyspark
>>pyspark.__version__
3.1.2
JDK版本:>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
环境:Google Colab
您的代码
model.fit(df_train)
实际上并没有给你一个训练好的模型,因为变量model
的类型仍然是pyspark.ml.classification.LogisticRegression
类
type(model)
# pyspark.ml.classification.LogisticRegression
因此,您应该通过将返回对象赋值给变量或覆盖model
变量来捕获返回对象,然后它将为您提供pyspark.ml.classification.LogisticRegressionModel
类的训练逻辑回归模型
model = model.fit(df_train)
type(model)
# pyspark.ml.classification.LogisticRegressionModel
最后,.predict
和.predictProbability
方法需要一个pyspark.ml.linalg.DenseVector
对象的参数。所以,我认为你想使用.transform
代替,因为它将添加预测标签和概率作为列输入数据框。就像这样
predicted_df = model.transform(df_val)