如何为以下 json 创建架构以读取架构。我正在使用hiveContext.read.schema((.json("input.json"(,我想忽略前两个"ErrorMessage"和"IsError"只读报告。以下是 JSON:
{
"ErrorMessage": null,
"IsError": false,
"Report":{
"tl":[
{
"TlID":"F6",
"CID":"mo"
},
{
"TlID":"Fk",
"CID":"mo"
}
]
}
}
我创建了以下架构:
val schema = StructType(
Array(
StructField("Report", StructType(
Array(
StructField
("tl",ArrayType(StructType(Array(
StructField("TlID", StringType),
StructField("CID", IntegerType)
)))))))))
Below is my json.printSchema() :
root
|-- Report: struct (nullable = true)
| |-- tl: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- TlID: string (nullable = true)
| | | |-- CID: integer (nullable = true)
架构不正确。 数据中的CID
显然不是String
("mo"
(。用
val schema = StructType(Array(
StructField("Report", StructType(
Array(
StructField
("tl",ArrayType(StructType(Array(
StructField("CID", StringType),
StructField("TlID", StringType)
)))))))))
和:
val df = Seq("""{
"ErrorMessage": null,
"IsError": false,
"Report":{
"tl":[
{
"TlID":"F6",
"CID":"mo"
},
{
"TlID":"Fk",
"CID":"mo"
}
]
}
}""").toDS
spark.read.schema(schema).json(df).show(false)
+--------------------------------+
|Report |
+--------------------------------+
|[WrappedArray([mo,F6], [mo,Fk])]|
+--------------------------------+
Datatype: array<struct<metrics_name:string,metrics_value:string>>
import org.apache.spark.sql.types.{ArrayType}
StructField("usage_metrics", ArrayType(StructType(
Array(
StructField("metric_name", StringType, true),
StructField("metric_value", StringType, true)
)
))))