在Watson Studio IBM Cloud上的spark环境中出现 stackoverflow错误



我正在遵循IBM Cloud (https://eu-de.dataplatform.cloud.ibm.com/exchange/public/entry/view/99b857815e69353c04d95daefb3b91fa?context=cpdaas)上Watson Studio Gallery的spark教程,并遇到Java堆栈溢出问题:

Py4JJavaError: An error occurred while calling o20418.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError
java.lang.StackOverflowError
at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:516)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)

问题行:

cvModel = crossval.fit(trainingRatings)

问题单元格:

from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
(trainingRatings, validationRatings) = ratings.randomSplit([80.0, 20.0])
evaluator = RegressionEvaluator(metricName='rmse', labelCol='rating', predictionCol='prediction')
paramGrid = ParamGridBuilder().addGrid(als.rank, [1, 5, 10]).addGrid(als.maxIter, [20]).addGrid(als.regParam, [0.05, 0.1, 0.5]).build()
crossval = CrossValidator(estimator=als, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=10)
cvModel = crossval.fit(trainingRatings)
predictions = cvModel.transform(validationRatings)
print('The root mean squared error for our model is: {}'.format(evaluator.evaluate(predictions.na.drop())))

使用环境:Default Spark 3.2 & Python 3.9

我很感激你的帮助。

我通过给笔记本的虚拟机增加内存来解决这个问题。

最新更新