如何根据数据集一列数组中存在的多个值在数据集中创建新行:
我有一个包含以下数据的数据集:
+----+---------+-------------------+------------------+
|name|productId| total| scores|
+----+---------+-------------------+------------------+
| aaa| 200| 0.29| [0.29]|
| bbb| 200| 1.3900000000000001| [0.53, 0.33]|
| aaa| 100|0.22999999999999998| [0.12, 0.11]|
+----+---------+-------------------+------------------+
我想在 scala 中将其转换为以下格式:
+----+---------+-------------------+------------------+
|name|productId| total| scores|
+----+---------+-------------------+------------------+
| aaa| 200| 0.29| 0.29 |
| bbb| 200| 1.3900000000000001| 0.53 |
| bbb| 200| 1.3900000000000001| 0.33 |
| aaa| 100|0.22999999999999998| 0.12 |
| aaa| 100|0.22999999999999998| 0.11 |
+----+---------+-------------------+------------------+
这正是 explode
函数的用途:
df.withColumn("score", explode('scores))