在Spark Scala中将一行拆分为几行



我有一个表,表中有以下类型的所有数据:03104|00000000000000105000|00000000000000002000|001|00095|000000000000000000000000000000000000000000001835162021-07-15

我想把这个列分割成:

+--------------------+
|               value|
+--------------------+
|               03104|
|00000000000000105000|
|00000000000000002000|
|                 001|
|               00095|
|00000000000000000...|
+--------------------+

我该怎么做?

您可以通过|拆分列以获得数组,然后调用explosion/explode_outer以获得所需的结果。

val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.implicits._
List("03104|00000000000000105000|00000000000000002000|001|00095|" +
"000000000000000000000000000000000000000000001835162021-07-15")
.toDF("value")
.select(explode_outer(split('value, "\|")).as("value"))
.show(false)
/*
+------------------------------------------------------------+
|value                                                       |
+------------------------------------------------------------+
|03104                                                       |
|00000000000000105000                                        |
|00000000000000002000                                        |
|001                                                         |
|00095                                                       |
|000000000000000000000000000000000000000000001835162021-07-15|
+------------------------------------------------------------+ */

最新更新