我有一个表,表中有以下类型的所有数据:03104|00000000000000105000|00000000000000002000|001|00095|000000000000000000000000000000000000000000001835162021-07-15
我想把这个列分割成:
+--------------------+
| value|
+--------------------+
| 03104|
|00000000000000105000|
|00000000000000002000|
| 001|
| 00095|
|00000000000000000...|
+--------------------+
我该怎么做?
您可以通过|
拆分列以获得数组,然后调用explosion/explode_outer以获得所需的结果。
val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.implicits._
List("03104|00000000000000105000|00000000000000002000|001|00095|" +
"000000000000000000000000000000000000000000001835162021-07-15")
.toDF("value")
.select(explode_outer(split('value, "\|")).as("value"))
.show(false)
/*
+------------------------------------------------------------+
|value |
+------------------------------------------------------------+
|03104 |
|00000000000000105000 |
|00000000000000002000 |
|001 |
|00095 |
|000000000000000000000000000000000000000000001835162021-07-15|
+------------------------------------------------------------+ */