我有兴趣将data
attrbitue 值访问为行,该行中的每个项目都将值分配给本问题底部示例中提到的相应列名。
{
"meta": {
"a": {
"b": []
}
},
"data" : [ [ "row-r9pv-p86t.ifsp", "00000000-0000-0000-0838-60C2FFCC43AE", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "KINGS", "F", "11" ]
, [ "row-7v2v~88z5-44se", "00000000-0000-0000-C8FC-DDD3F9A72DFF", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "SUFFOLK", "F", "6" ]
, [ "row-hzc9-4kvv~mbc9", "00000000-0000-0000-562E-D9A0792557FC", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "MONROE", "F", "6" ]
, [ "row-3473_8cwy~3vez", "00000000-0000-0000-B19D-7B88FF2FB6A0", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOEY", "ERIE", "F", "9" ]
, [ "row-tyuh.nmy9.r2n3", "00000000-0000-0000-7D66-E7EC8F12BB8D", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "ULSTER", "F", "5" ]
, [ "row-ct48~ui69-2zsn", "00000000-0000-0000-7ECC-F350540A8F92", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "WESTCHESTER", "F", "24" ]
, [ "row-gdva~4v8k-vuwy", "00000000-0000-0000-30FB-CB5E36017AD5", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "BRONX", "F", "13" ]
, [ "row-gzu3~a7hk~bqym", "00000000-0000-0000-E380-AAAB1FA5C7A7", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "NEW YORK", "F", "55" ]
, [ "row-ekbw_tb7c.yvgp", "00000000-0000-0000-A7FF-8A4260B3A505", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "NASSAU", "F", "15" ]
, [ "row-zk7s-r2ma_t8mk", "00000000-0000-0000-3F7C-4DECA15E0F5B", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "ERIE", "F", "6" ]
, [ "row-ieja_864x~w2ki", "00000000-0000-0000-854E-D29D5B4D5636", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "SUFFOLK", "F", "14" ]
, [ "row-8fp4.rjtj.h46h", "00000000-0000-0000-C177-43F52BFECC07", 0, 1574264158, null, 1574264158, null, "{ }", "2007", "ZOE", "KINGS", "F", "34" ]
]
}
我尝试遵循,但它只为每行提供空值。您能否帮助如何将行中的每个项目放入特定字段,例如将每个值分配给下面值右侧命名的属性。
val schema = new StructType()
.add(
"data", new ArrayType(new StructType(), false), false
)
val nestDF = spark.read.schema(schema).json("dbfs:/tmp/rows.json")
这是预期的结构:
/* [
"row-r9pv-p86t.ifsp" <-- sid
"00000000-0000-0000-0838-60C2FFCC43AE" <-- id
0 <-- position
1574264158 <-- created_at
null <-- created_meta
1574264158 <-- updated_at
null <-- updated_meta
"{ }" <-- meta
"2007" <-- year of birth
"ZOEY" <-- child's first name
"KINGS" <-- county
"F" <-- gender
"11" <-- count
]
*/
Atharva,你可以试试这段代码。我没有将属性转换为预期的数据类型,但现在应该很容易:):
import sparkSession.implicits._
import org.apache.spark.sql.functions._
val df = sparkSession.read.option("multiLine", true).json("src/main/resources/json.json")
val schema = StructType(Seq("sid","id","position","created_at","created_meta","updated_at","updated_meta","meta","yearOfBirth","childsFirstName","county","gender","count").map(c => StructField(c, StringType)))
val toStruct = udf({seq: Seq[String] => Row.fromSeq(seq)}, schema)
val newDF = df.select(explode($"data").as("dataRow"))
.select(toStruct($"dataRow").as("struct"))
.select("struct.*")
newDF.printSchema()
root
|-- sid: string (nullable = true)
|-- id: string (nullable = true)
|-- position: string (nullable = true)
|-- created_at: string (nullable = true)
|-- created_meta: string (nullable = true)
|-- updated_at: string (nullable = true)
|-- updated_meta: string (nullable = true)
|-- meta: string (nullable = true)
|-- yearOfBirth: string (nullable = true)
|-- childsFirstName: string (nullable = true)
|-- county: string (nullable = true)
|-- gender: string (nullable = true)
|-- count: string (nullable = true)
newDF.show(false)
+------------------+------------------------------------+--------+----------+------------+----------+------------+----+-----------+---------------+-----------+------+-----+
|sid |id |position|created_at|created_meta|updated_at|updated_meta|meta|yearOfBirth|childsFirstName|county |gender|count|
+------------------+------------------------------------+--------+----------+------------+----------+------------+----+-----------+---------------+-----------+------+-----+
|row-r9pv-p86t.ifsp|00000000-0000-0000-0838-60C2FFCC43AE|0 |1574264158|null |1574264158|null |{ } |2007 |ZOEY |KINGS |F |11 |
|row-7v2v~88z5-44se|00000000-0000-0000-C8FC-DDD3F9A72DFF|0 |1574264158|null |1574264158|null |{ } |2007 |ZOEY |SUFFOLK |F |6 |
|row-hzc9-4kvv~mbc9|00000000-0000-0000-562E-D9A0792557FC|0 |1574264158|null |1574264158|null |{ } |2007 |ZOEY |MONROE |F |6 |
|row-3473_8cwy~3vez|00000000-0000-0000-B19D-7B88FF2FB6A0|0 |1574264158|null |1574264158|null |{ } |2007 |ZOEY |ERIE |F |9 |
|row-tyuh.nmy9.r2n3|00000000-0000-0000-7D66-E7EC8F12BB8D|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |ULSTER |F |5 |
|row-ct48~ui69-2zsn|00000000-0000-0000-7ECC-F350540A8F92|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |WESTCHESTER|F |24 |
|row-gdva~4v8k-vuwy|00000000-0000-0000-30FB-CB5E36017AD5|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |BRONX |F |13 |
|row-gzu3~a7hk~bqym|00000000-0000-0000-E380-AAAB1FA5C7A7|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |NEW YORK |F |55 |
|row-ekbw_tb7c.yvgp|00000000-0000-0000-A7FF-8A4260B3A505|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |NASSAU |F |15 |
|row-zk7s-r2ma_t8mk|00000000-0000-0000-3F7C-4DECA15E0F5B|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |ERIE |F |6 |
|row-ieja_864x~w2ki|00000000-0000-0000-854E-D29D5B4D5636|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |SUFFOLK |F |14 |
|row-8fp4.rjtj.h46h|00000000-0000-0000-C177-43F52BFECC07|0 |1574264158|null |1574264158|null |{ } |2007 |ZOE |KINGS |F |34 |
+------------------+------------------------------------+--------+----------+------------+----------+------------+----+-----------+---------------+-----------+------+-----+