java.lang.RuntimeException:java.lang.String 不是 bigint 或 int

我正在从文本文件中读取数据框的架构。该文件看起来像

id,1,bigint
price,2,bigint
sqft,3,bigint
zip_id,4,int
name,5,string

我正在将解析的数据类型映射到 Spark Sql 数据类型。创建数据框的代码是 -

var schemaSt = new ListBuffer[(String,String)]()
// read schema from file
for (line <- Source.fromFile("meta.txt").getLines()) {
  val word = line.split(",")
  schemaSt += ((word(0),word(2)))
}
// map datatypes
val types = Map("int" -> IntegerType, "bigint" -> LongType)
      .withDefault(_ => StringType)
val schemaChanged = schemaSt.map(x => (x._1,types(x._2))
// read data source
val lines = spark.sparkContext.textFile("data source path")
val fields = schemaChanged.map(x => StructField(x._1, x._2, nullable = true)).toList
val schema = StructType(fields)
val rowRDD = lines
  .map(_.split("t"))
  .map(attributes => Row.fromSeq(attributes))
// Apply the schema to the RDD
val new_df = spark.createDataFrame(rowRDD, schema)
new_df.show(5)
new_df.printSchema()

但上述仅适用于字符串类型。对于 IntegerType 和 LongType，它抛出异常 -

java.lang.RuntimeException： java.lang.String 不是 int 模式的有效外部类型

和

java.lang.RuntimeException：java.lang.String 不是 bigint 模式的有效外部类型。

提前感谢！

我遇到了同样的问题，其原因是Row.fromSeq()调用。

如果在String数组上调用它，则得到的Row是String的行。这与架构中第二列的类型不匹配（bigint或int）。

为了获得有效的数据帧作为Row.fromSeq(values: Seq[Any])的结果，values参数的元素必须是与您的模式对应的类型。

您正在尝试将字符串存储在数字类型的列中。

解析时，需要将字符串编码的数字数据强制转换为适当的数值类型。

相关内容

最新更新

热门标签：