尝试运行以下命令时收到错误java.lang.IllegalArgumentException: Can't get JDBC type for null
例:
...
val spark = SparkSession.builder
.master("local[*]")
.appName("Demo")
.detOrCreate()
import spark.implicits._
//load first table
val df_one = spark.read
.format("jdbc")
.option("url",myDbUrl)
.option("dbtable",myTableOne)
.option("user",myUser)
.option("password",myPassw)
.load()
df_one.createGlobalTempView("table_one")
//load second table
val df_two = spark.read
.format("jdbc")
.option("url",myUrl)
.option("dbtable",myTableTwo)
.option("user",myUser)
.option("password",myPassw)
.load()
df_two.createGlobalTempView("table_two")
//perform join of two tables
val df_result = spark.sql(
"select o.field_one, t.field_two, null as field_three "+
" from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
)
//Error there:
df_result.write
.format(jdbc)
.option("dbtable",myResultTable)
.option("url",myDbUrl)
.option("user",myUser)
.option("password",myPassw)
.mode(SaveMode.Append)
.save
...
我收到错误:
Exception in thread "main" java.lang.IllegalArgumentException: Can't get JDBC type for null
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:147)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:663)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:662)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:662)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
解决方法,这会大大减慢工作流程:
...
// create case class for DataSet
case class ResultCaseClass(field_one: Option[Int], field_two: Option[Int], field_three: Option[Int])
//perform join of two tables
val ds_result = spark.sql(
"select o.field_one, t.field_two, null as field_three "+
" from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
)
.withColumn("field_one",$"field_one".cast(IntegerType))
.withColumn("field_two",$"field_two".cast(IntegerType))
.withColumn("field_three",$"field_three".cast(IntegerType))
.as[ResultCaseClass]
//Success:
ds_result.write......
...
我遇到了和你一样的问题。然后我从java源代码中找到了错误信息。如果在不指定数据类型的情况下将空值插入数据库,则会出现"无法获取 null 的 JDBC 类型"。解决此问题的方法是将 null 强制转换为等于数据库的归档类型的数据类型。
例:
lit(null).cast(StringType) or lit(null).cast("string")