我有一个带有地图的数据框架:
sdf = spark.createDataFrame(
[
(1, {'Kira':25,'Lilly':15}),
(2, {'Tom':14}),
],
["id", "label"]
)
+---+-------------------------+
|id |label |
+---+-------------------------+
|1 |{Lilly -> 15, Kira -> 25}|
|2 |{Tom -> 14} |
+---+-------------------------+
我想把键放在一列,值放在另一列,像这样:
+---+-----+---+
|id |name |age|
+---+-----+---+
|1 |Kira |25 |
|1 |Lilly|15 |
|2 |Tom |14 |
+---+-----+---+
长手。使用地图集合函数创建姓名和年龄列。利用内联函数爆炸
sdf.withColumn('name',map_keys('label')).withColumn('age', map_values('label')).selectExpr('id','inline(arrays_zip(name,age))').show()
+---+-----+---+
| id| name|age|
+---+-----+---+
| 1|Lilly| 15|
| 1| Kira| 25|
| 2| Tom| 14|
+---+-----+---+