的列将其分为两个值的列转换表。
df.printschema()
root
|-- range: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- a: long (nullable = true)
| | |-- b: string (nullable = true)
df.show()
+------------+
| range |
+------------+
|[[3, Hello]]|
+------------+
我所需的输出
+------------+
| a | b |
+------------+
| 3 | Hello|
+------------+
用具有两个值(单独的,)将其分为两个
这是pyspark版本用户deo的scala答案:
import pyspark.sql.functions as F
j = '{"range":[{"a":3,"b":"Hello"}]}'
df = spark.read.json(sc.parallelize([j]))
#This convertes the array column to a struct column
df=df.withColumn('exploded', F.explode('range'))
#and the columns of a struct can easily by selected with *
df.select('exploded.*').show()
您也可以使用oneliner进行此操作:
df.select(F.explode('range')).select('col.*').show()