如何将事例对象用作事例类中的字段并转换为Spark DataSet



我正在学习spark sql,并尝试在创建的数据集中应用筛选器。我定义了一个简单的Employee案例类,它有3个字段,name、salary和dpt。

case class Employee( name: String, salary: Double, age: Int, dpt: Dept)

最后一个字段dpt定义如下:


sealed trait Dept extends { val name: String }
case object Accountability extends Dept { override val name = "AC"}
case object Sales extends Dept { override val name = "S"}
case object Finance extends Dept { override val name = "F"}
case object Marketing extends Dept { override val name = "M"}
case object Communication extends Dept { override val name = "C"}
case object Reception extends Dept { override val name = "R"}
case object HumanResource extends Dept { override val name = "HR"}

我试过用kryo编码器来解决这个问题,但它不起作用。

object DeptEncoders {
implicit def deptEncoder : org.apache.spark.sql.Encoder[Dept] = org.apache.spark.sql.Encoders.kryo[Dept]
}

基于此处的文档:

import org.apache.spark.sql.Encoder
...
// conf is your org.apache.spark.SparkConf used to create your Spark Context
conf.registerKryoClasses(Array(classOf[Dept], classOf[Employee]))
...
implicit val encoder1:Encoder[Dept] = org.apache.spark.sql.Encoders.kryo[Dept]
implicit val encoder2:Encoder[Employee] = org.apache.spark.sql.Encoders.kryo[Employee]
...
val df = Seq(e1, e2).toDF()

最新更新