读取带有架构的JSON数组字符串返回null Spark 2.2.0

当我尝试读取包含JSON字符串作为数组的Spark DataFrame列时，带有定义的架构将返回null。我尝试了模式的数组，SEQ和列表，但所有返回null。我的火花版本是2.2.0

val dfdata= spark.sql("""select "[{ "id":"93993", "name":"Phil" }, { "id":"838", "name":"Don" }]" as theJson""")
dfdata.show(5,false)
val sch = StructType(
  Array(StructField("id", StringType, true),
      StructField("name", StringType, true)))
print(sch.prettyJson )                                             
dfdata.select(from_json($"theJson", sch)).show

和输出

+---------------------------------------------------------------+
|theJson                                                        |
+---------------------------------------------------------------+
|[{ "id":"93993", "name":"Phil" }, { "id":"838", "name":"Don" }]|
+---------------------------------------------------------------+
{
  "type" : "struct",
  "fields" : [ {
    "name" : "id",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "name",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  } ]
}+----------------------+
|jsontostructs(theJson)|
+----------------------+
|                  null|
+----------------------+

您的架构不太适合您的示例。您的示例是一系列结构。尝试将其包装在ArrayType中：

val sch = ArrayType(StructType(Array(
  StructField("id", StringType, true),
  StructField("name", StringType, true)
)))

您是否在获得DF之前尝试过解析JSON字符串？

// obtaining this string should be easy:
val jsonStr = """[{ "id":"93993", "name":"Phil" }, { "id":"838", "name":"Don" }]"""
// then you can take advantage of schema inference
val df2 = spark.read.json(Seq(jsonStr).toDS)
df2.show(false)
// it shows:
// +-----+----+
// |id   |name|
// +-----+----+
// |93993|Phil|
// |838  |Don |
// +-----+----+

相关内容

最新更新

热门标签：