创建使用BigInteger的Dataframe会抛出scala.MatchError:类java.math.BigIn



Spark版本:1.3

我有一个要求,在中我处理BigInteger类型的数据。Bean类(Pojo)使用很少的BigInteger数据类型。数据分析和创建JavaRDD工作正常,但当创建以JavaRDD和BeanClass为参数的数据帧时,Spark会抛出以下异常。

scala.MatchError: class java.math.BigInteger (of class java.lang.Class)
        at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1182)
        at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1181)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
        at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1181)
        at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:419)
        at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:447)

使用Spark Shell,我发现Scala能够处理BigInt

scala> val x = BigInt("190753000000000000000");
x: scala.math.BigInt = 190753000000000000000

我不确定发生异常的原因是什么。如有任何帮助,我们将不胜感激。

根据@zero232的回答,您的异常可以通过在变量声明中使用BigDecimal而不是BigInt来解决:

scala> val x = BigDecimal("190753000000000000000");

您会得到一个异常,因为BigInt不是Spark DataFrames支持的数据类型。唯一支持的Big*类型是BigDecimal

您可以在Spark SQL和DataFrame Guide 的"数据类型"部分找到受支持类型和映射的完整列表

您可以参考memSQL的BigInt UserDefinedType:

    package com.memsql.spark.connector.dataframe
    import org.apache.spark.sql.types._

    @SQLUserDefinedType(udt = classOf[BigIntUnsignedType])
    class BigIntUnsignedValue(val value: Long) extends Serializable {
      override def toString: String = value.toString
    }
    /**
     * Spark SQL [[org.apache.spark.sql.types.UserDefinedType]] for MemSQL's `BIGINT UNSIGNED` column type.
     */
    class BigIntUnsignedType private() extends UserDefinedType[BigIntUnsignedValue] {
      override def sqlType: DataType = LongType
      override def serialize(obj: Any): Long = {
        obj match {
          case x: BigIntUnsignedValue => x.value
          case x: String       => x.toLong
          case x: Long         => x
        }
      }
      override def deserialize(datum: Any): BigIntUnsignedValue = {
        datum match {
          case x: String => new BigIntUnsignedValue(x.toLong)
          case x: Long => new BigIntUnsignedValue(x)
        }
      }
      override def userClass: Class[BigIntUnsignedValue] = classOf[BigIntUnsignedValue]
      override def asNullable: BigIntUnsignedType = this
      override def typeName: String = "bigint unsigned"
    }
    case object BigIntUnsignedType extends BigIntUnsignedType

相关内容

  • 没有找到相关文章

最新更新