什么样的模式将有助于将这种类型的json解析为Scala中的Spark SQL?



我有兴趣将dataattrbitue 值访问为行,该行中的每个项目都将值分配给本问题底部示例中提到的相应列名。

{
"meta": {
"a": {
"b": []
}
},
"data" : [ [ "row-r9pv-p86t.ifsp", "00000000-0000-0000-0838-60C2FFCC43AE", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "KINGS", "F", "11" ]
, [ "row-7v2v~88z5-44se", "00000000-0000-0000-C8FC-DDD3F9A72DFF", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "SUFFOLK", "F", "6" ]
, [ "row-hzc9-4kvv~mbc9", "00000000-0000-0000-562E-D9A0792557FC", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "MONROE", "F", "6" ]
, [ "row-3473_8cwy~3vez", "00000000-0000-0000-B19D-7B88FF2FB6A0", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "ERIE", "F", "9" ]
, [ "row-tyuh.nmy9.r2n3", "00000000-0000-0000-7D66-E7EC8F12BB8D", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "ULSTER", "F", "5" ]
, [ "row-ct48~ui69-2zsn", "00000000-0000-0000-7ECC-F350540A8F92", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "WESTCHESTER", "F", "24" ]
, [ "row-gdva~4v8k-vuwy", "00000000-0000-0000-30FB-CB5E36017AD5", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "BRONX", "F", "13" ]
, [ "row-gzu3~a7hk~bqym", "00000000-0000-0000-E380-AAAB1FA5C7A7", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "NEW YORK", "F", "55" ]
, [ "row-ekbw_tb7c.yvgp", "00000000-0000-0000-A7FF-8A4260B3A505", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "NASSAU", "F", "15" ]
, [ "row-zk7s-r2ma_t8mk", "00000000-0000-0000-3F7C-4DECA15E0F5B", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "ERIE", "F", "6" ]
, [ "row-ieja_864x~w2ki", "00000000-0000-0000-854E-D29D5B4D5636", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "SUFFOLK", "F", "14" ]
, [ "row-8fp4.rjtj.h46h", "00000000-0000-0000-C177-43F52BFECC07", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "KINGS", "F", "34" ]
]
}

我尝试遵循,但它只为每行提供空值。您能否帮助如何将行中的每个项目放入特定字段,例如将每个值分配给下面值右侧命名的属性。

val schema = new StructType()
.add(
"data", new ArrayType(new StructType(), false), false
)
val nestDF = spark.read.schema(schema).json("dbfs:/tmp/rows.json")

这是预期的结构:

/* [ 
"row-r9pv-p86t.ifsp" <--  sid
"00000000-0000-0000-0838-60C2FFCC43AE" <-- id
0 <-- position
1574264158 <-- created_at
null <-- created_meta
1574264158 <-- updated_at
null <-- updated_meta
"{ }" <-- meta
"2007" <-- year of birth
"ZOEY" <-- child's first name
"KINGS" <-- county
"F" <-- gender
"11" <-- count
] 
*/

Atharva,你可以试试这段代码。我没有将属性转换为预期的数据类型,但现在应该很容易:):

import sparkSession.implicits._
import org.apache.spark.sql.functions._
val df = sparkSession.read.option("multiLine", true).json("src/main/resources/json.json")
val schema = StructType(Seq("sid","id","position","created_at","created_meta","updated_at","updated_meta","meta","yearOfBirth","childsFirstName","county","gender","count").map(c => StructField(c, StringType)))
val toStruct = udf({seq: Seq[String] => Row.fromSeq(seq)}, schema)
val newDF = df.select(explode($"data").as("dataRow"))
.select(toStruct($"dataRow").as("struct"))
.select("struct.*")
newDF.printSchema()
root
|-- sid: string (nullable = true)
|-- id: string (nullable = true)
|-- position: string (nullable = true)
|-- created_at: string (nullable = true)
|-- created_meta: string (nullable = true)
|-- updated_at: string (nullable = true)
|-- updated_meta: string (nullable = true)
|-- meta: string (nullable = true)
|-- yearOfBirth: string (nullable = true)
|-- childsFirstName: string (nullable = true)
|-- county: string (nullable = true)
|-- gender: string (nullable = true)
|-- count: string (nullable = true)

newDF.show(false)
+------------------+------------------------------------+--------+----------+------------+----------+------------+----+-----------+---------------+-----------+------+-----+
|sid               |id                                  |position|created_at|created_meta|updated_at|updated_meta|meta|yearOfBirth|childsFirstName|county     |gender|count|
+------------------+------------------------------------+--------+----------+------------+----------+------------+----+-----------+---------------+-----------+------+-----+
|row-r9pv-p86t.ifsp|00000000-0000-0000-0838-60C2FFCC43AE|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOEY           |KINGS      |F     |11   |
|row-7v2v~88z5-44se|00000000-0000-0000-C8FC-DDD3F9A72DFF|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOEY           |SUFFOLK    |F     |6    |
|row-hzc9-4kvv~mbc9|00000000-0000-0000-562E-D9A0792557FC|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOEY           |MONROE     |F     |6    |
|row-3473_8cwy~3vez|00000000-0000-0000-B19D-7B88FF2FB6A0|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOEY           |ERIE       |F     |9    |
|row-tyuh.nmy9.r2n3|00000000-0000-0000-7D66-E7EC8F12BB8D|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |ULSTER     |F     |5    |
|row-ct48~ui69-2zsn|00000000-0000-0000-7ECC-F350540A8F92|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |WESTCHESTER|F     |24   |
|row-gdva~4v8k-vuwy|00000000-0000-0000-30FB-CB5E36017AD5|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |BRONX      |F     |13   |
|row-gzu3~a7hk~bqym|00000000-0000-0000-E380-AAAB1FA5C7A7|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |NEW YORK   |F     |55   |
|row-ekbw_tb7c.yvgp|00000000-0000-0000-A7FF-8A4260B3A505|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |NASSAU     |F     |15   |
|row-zk7s-r2ma_t8mk|00000000-0000-0000-3F7C-4DECA15E0F5B|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |ERIE       |F     |6    |
|row-ieja_864x~w2ki|00000000-0000-0000-854E-D29D5B4D5636|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |SUFFOLK    |F     |14   |
|row-8fp4.rjtj.h46h|00000000-0000-0000-C177-43F52BFECC07|0       |1574264158|null        |1574264158|null        |{ } |2007       |ZOE            |KINGS      |F     |34   |
+------------------+------------------------------------+--------+----------+------------+----------+------------+----+-----------+---------------+-----------+------+-----+

最新更新