使用递归案例类的 Spark

我有一个递归的数据结构。Spark 给出此错误：

Exception in thread "main" java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class BulletPoint

作为一个例子，我做了这个代码：

case class BulletPoint(item: String, children: List[BulletPoint])
object TestApp extends App {
  val sparkSession = SparkSession
    .builder()
    .appName("spark app")
    .master(s"local")
    .getOrCreate()
  import sparkSession.implicits._
  sparkSession.createDataset(List(BulletPoint("1", Nil), BulletPoint("2", Nil)))
}

有人知道如何解决这个问题吗？

例外是相当明确的 - 默认情况下不支持这种情况。您必须记住，Datasets被编码为关系架构，因此所有必填字段都必须预先声明并限定。这里没有递归结构的地方。

这里有一个小窗口 - 二进制Encoders：

import org.apache.spark.sql.{Encoder, Encoders}
sparkSession.createDataset(List(
  BulletPoint("1", Nil), BulletPoint("2", Nil)
))(Encoders.kryo[BulletPoint])

或同等学历：

implicit val bulletPointEncoder = Encoders.kryo[BulletPoint]
sparkSession.createDataset(List(
  BulletPoint("1", Nil), BulletPoint("2", Nil)
))

但这真的不是你希望在代码中包含的东西，除非绝对必要。

相关内容

最新更新

热门标签：