Spark版本:1.3
我有一个要求,在中我处理BigInteger
类型的数据。Bean类(Pojo)使用很少的BigInteger
数据类型。数据分析和创建JavaRDD
工作正常,但当创建以JavaRDD和BeanClass为参数的数据帧时,Spark会抛出以下异常。
scala.MatchError: class java.math.BigInteger (of class java.lang.Class)
at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1182)
at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1181)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1181)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:419)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:447)
使用Spark Shell,我发现Scala能够处理BigInt
scala> val x = BigInt("190753000000000000000");
x: scala.math.BigInt = 190753000000000000000
我不确定发生异常的原因是什么。如有任何帮助,我们将不胜感激。
根据@zero232的回答,您的异常可以通过在变量声明中使用BigDecimal
而不是BigInt
来解决:
scala> val x = BigDecimal("190753000000000000000");
您会得到一个异常,因为BigInt
不是Spark DataFrames
支持的数据类型。唯一支持的Big*
类型是BigDecimal
。
您可以在Spark SQL和DataFrame Guide 的"数据类型"部分找到受支持类型和映射的完整列表
您可以参考memSQL的BigInt UserDefinedType:
package com.memsql.spark.connector.dataframe
import org.apache.spark.sql.types._
@SQLUserDefinedType(udt = classOf[BigIntUnsignedType])
class BigIntUnsignedValue(val value: Long) extends Serializable {
override def toString: String = value.toString
}
/**
* Spark SQL [[org.apache.spark.sql.types.UserDefinedType]] for MemSQL's `BIGINT UNSIGNED` column type.
*/
class BigIntUnsignedType private() extends UserDefinedType[BigIntUnsignedValue] {
override def sqlType: DataType = LongType
override def serialize(obj: Any): Long = {
obj match {
case x: BigIntUnsignedValue => x.value
case x: String => x.toLong
case x: Long => x
}
}
override def deserialize(datum: Any): BigIntUnsignedValue = {
datum match {
case x: String => new BigIntUnsignedValue(x.toLong)
case x: Long => new BigIntUnsignedValue(x)
}
}
override def userClass: Class[BigIntUnsignedValue] = classOf[BigIntUnsignedValue]
override def asNullable: BigIntUnsignedType = this
override def typeName: String = "bigint unsigned"
}
case object BigIntUnsignedType extends BigIntUnsignedType