通过dataframe(Scala)迭代时，任务不可序列化

以下是我的代码，当我尝试遍历每一行时：

val df: DataFrame = sqlContext.read
  .format("com.databricks.spark.csv")
  .option("header", true) // Use first line of all files as header
  .option("delimiter", TILDE)
  .option("inferSchema", "true") // Automatically infer data types
  .load(fileName._2)
val accGrpCountsIds: DataFrame = df.groupBy("accgrpid").count()
LOGGER.info(s"DataFrame Count - ${accGrpCountsIds.count()}")
accGrpCountsIds.show(3)
//switch based on file names and update the model.
accGrpCountsIds.foreach(accGrpRow => {
  val accGrpId = accGrpRow.getLong(0)
  val rowCount = accGrpRow.getInt(1)
}

当我尝试使用foreach通过上述数据框架交叉时，我会得到一个任务而不是可序列化错误。我该怎么做？

您是否还有其他类型的foreach？或者这就是您所做的，它不起作用？

accGrpCountsIds.foreach(accGrpRow => {
  val accGrpId = accGrpRow.getLong(0)
  val rowCount = accGrpRow.getInt(1)
}

另外，您可能会发现这有用吗？任务不是序列化的：Java.io.notserializable Exception在呼叫功能外部仅在类别上闭合而不是对象

相关内容

最新更新

热门标签：