鉴于我有一个包含以下结构的两列的数据帧,如何将start_date和end_date中的数据转换为 yyyy/MM/dd 格式,并在 sparkSQL (Scala( 中具有日期数据类型。
此外end_date结构也可以为空。
|-- start_date: struct (nullable = true)
| |-- year: integer (nullable = true)
| |-- month: integer (nullable = true)
| |-- day: integer (nullable = true)
|-- end_date: struct (nullable = true)
| |-- year: integer (nullable = true)
| |-- month: integer (nullable = true)
| |-- day: integer (nullable = true)
Spark 2.4+ 你可以在这里使用结构访问 (.( 运算符方法 我正在分享代码片段。
scala> df.show
+--------------+--------------+
| start_date| end_date|
+--------------+--------------+
|[2019, 07, 11]|[2019, 08, 12]|
|[2019, 07, 14]|[2019, 08, 13]|
+--------------+--------------+
scala> df.printSchema
root
|-- start_date: struct (nullable = false)
| |-- year: string (nullable = true)
| |-- month: string (nullable = true)
| |-- day: string (nullable = true)
|-- end_date: struct (nullable = false)
| |-- year: string (nullable = true)
| |-- month: string (nullable = true)
| |-- day: string (nullable = true)
scala>var df1= df.withColumn("start_date",date_format(concat_ws("-",col("start_date.year"),col("start_date.month"),col("start_date.day")),"yyyy/MM/dd")).withColumn("end_date",date_format(concat_ws("-",col("end_date.year"),col("end_date.month"),col("end_date.day")),"yyyy/MM/dd"))
scala> df1.show
+----------+----------+
|start_date| end_date|
+----------+----------+
|2019/07/11|2019/08/12|
|2019/07/14|2019/08/13|
+----------+----------+
如果您有任何问题,请告诉我。
您可以使用to_date&&
format_string||
concact_ws的组合(嵌套(。通常,您可以使用数据帧函数实现所需内容的 90%
醒来后,我会提供更多细节。我住的地方已经很晚了...
更新:
data.withColumn("start_date_as_date",
to_date(
concat_ws("/", $"start_date.year", $"start_date.month", $"start_date.day"),
"yyyy/MM/dd")
).show
+-------------+-------------+------------------+
| start_date| end_date|start_date_as_date|
+-------------+-------------+------------------+
| [776, 9, 1]| [2019, 9, 2]| 0776-09-01|
|[2019, 9, 18]|[2019, 9, 19]| 2019-09-18|
|[2019, 10, 1]|[2019, 10, 2]| 2019-10-01|
+-------------+-------------+------------------+
... .printSchema
root
|-- start_date: struct (nullable = true)
| |-- year: integer (nullable = false)
| |-- month: integer (nullable = false)
| |-- day: integer (nullable = false)
|-- end_date: struct (nullable = true)
| |-- year: integer (nullable = false)
| |-- month: integer (nullable = false)
| |-- day: integer (nullable = false)
|-- start_date_as_date: date (nullable = true)
或者,您也可以使用:
format_string("%02d/%02d/%02d", // this lets you get creative if you want!
$"start_date.year", $"start_date.month", $"start_date.day")`