"NotSerializableException " 在 scala 映射函数中

>我正在读取一个文件并尝试使用函数映射值。但它给出了一个错误非可序列化异常下面是我正在运行的代码：

package com.sundogsoftware.spark
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.log4j._
import scala.math.min
/** Find the minimum temperature by weather station */
object MinTemperatures {
def parseLine(line: String) = {
val fields = line.split(",")
val stationID = fields(0)
val entryType = fields(2)
val temperature = fields(3).toFloat * 0.1f * (9.0f / 5.0f) + 32.0f
(stationID, entryType, temperature)
}
/** Our main function where the action happens */
def main(args: Array[String]) {
// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
// Create a SparkContext using every core of the local machine
val sc = new SparkContext("local[*]", "MinTemperatures")
// Read each line of input data
val lines = sc.textFile("../DataSet/1800.csv")
// Convert to (stationID, entryType, temperature) tuples
val parsedLines = lines.map(parseLine)
}
}

当我运行上面的代码时，它给了我以下错误：

使用 Spark 的默认 log4j 配置文件： org/apache/spark/log4j-defaults.properties 线程"main"中的异常 org.apache.spark.SparkException：任务不可序列化 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala：403( 在 org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala：393( 在 org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala：162( at org.apache.spark.SparkContext.clean(SparkContext.scala：2326( at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala：371( at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala：151( 在 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala：112( at org.apache.spark.rdd.RDD.withScope(RDD.scala：363( at org.apache.spark.rdd.RDD.map(RDD.scala：370( at com.sundogsoftware.spark.MinTemperature$.main(MinTemperature.scala：32( 在 com.sundogsoftware.spark.MinTemperature.main(MinTemperature.scala(
由以下原因引起：java.io.NotSerializableException：
com.sundogsoftware.spark.MinTemperature$ 序列化堆栈： - 对象不可序列化(类：com.sundogsoftware.spark.MinTemperature$，值： com.sundogsoftware.spark.MinTemperature$@41fed14f( - 数组元素(索引：0( - 数组(类 [Ljava.lang.Object;，大小为 1( - field (class： java.lang.invoke.SerializedLambda， name： capturedArgs， type： class [Ljava.lang.Object;( - object (class java.lang.invoke.SerializedLambda， SerializedLambda[capturingClass=class com.sundogsoftware.spark.MinTemperature$， functionalInterfaceMethod=scala/Function1.apply：(Ljava/lang/Object;(Ljava/lang/Object;，实现=调用静态 com/sundogsoftware/spark/MinTemperature$.$anonfun$main$1：(Lcom/sundogsoftware/spark/MinTemperature$;Ljava/lang/String;(Lscala/Tuple3;， instantiatedMethodType=(Ljava/lang/String;(Lscala/Tuple3;，捕获的编号=1]( - writeReplace data (class： java.lang.invoke.SerializedLambda(

但是当我使用匿名函数运行相同的代码时，它已成功运行：

package com.sundogsoftware.spark
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.log4j._
import scala.math.min
/** Find the minimum temperature by weather station */
object MinTemperatures {
/** Our main function where the action happens */
def main(args: Array[String]) {
// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
// Create a SparkContext using every core of the local machine
val sc = new SparkContext("local[*]", "MinTemperatures")
// Read each line of input data
val lines = sc.textFile("../DataSet/1800.csv")
// Convert to (stationID, entryType, temperature) tuples
val parsedLines = lines.map(x => {
val fields = x.split(",");
val stationID = fields(0);
val entryType = fields(2);
val temperature = fields(3).toFloat * 0.1f * (9.0f / 5.0f) + 32.0f;
(stationID, entryType, temperature)
})
// Filter out all but TMIN entries
val minTemps = parsedLines.filter(x => x._2 == "TMIN")
// Convert to (stationID, temperature)
val stationTemps = minTemps.map(x => (x._1, x._3.toFloat))
// Reduce by stationID retaining the minimum temperature found
val minTempsByStation = stationTemps.reduceByKey((x, y) => min(x, y))
// Collect, format, and print the results
val results = minTempsByStation.collect()
for (result <- results.sorted) {
val station = result._1
val temp = result._2
val formattedTemp = f"$temp%.2f F"
println(s"$station minimum temperature: $formattedTemp")
}
}
}

输出：

EZE00100082 minimum temperature: 7.70 F
ITE00100554 minimum temperature: 5.36 F

正如您在上面看到的，当我在 map 中使用命名函数(parseLine( 时，它会给出错误，但是当我在 map 中使用匿名函数时，同一个程序而不是命名函数它已成功运行。

我查看了几个博客，但没有找到错误的原因。谁能帮我理解这一点？

这个问题似乎与 sbt 或依赖项无关，正如我所检查的，当脚本未定义为对象时会发生这种情况(默认情况下 Scala 对象是可序列化的(，因此此错误意味着脚本不可序列化。我创建了一个新对象并粘贴了相同的代码。成功了。

相关内容

最新更新

热门标签：