如何将 1 个结构<年:整数,月:整,日:整>的列转换为 scala 中的规范化 yyyy/MM/dd 日期数据类型格式



鉴于我有一个包含以下结构的两列的数据帧,如何将start_date和end_date中的数据转换为 yyyy/MM/dd 格式,并在 sparkSQL (Scala( 中具有日期数据类型。

此外end_date结构也可以为空。

|-- start_date: struct (nullable = true)
|    |-- year: integer (nullable = true)
|    |-- month: integer (nullable = true)
|    |-- day: integer (nullable = true)
|-- end_date: struct (nullable = true)
|    |-- year: integer (nullable = true)
|    |-- month: integer (nullable = true)
|    |-- day: integer (nullable = true)

Spark 2.4+ 你可以在这里使用结构访问 (.( 运算符方法 我正在分享代码片段。

scala> df.show
+--------------+--------------+
|    start_date|      end_date|
+--------------+--------------+
|[2019, 07, 11]|[2019, 08, 12]|
|[2019, 07, 14]|[2019, 08, 13]|
+--------------+--------------+
scala> df.printSchema
root
|-- start_date: struct (nullable = false)
|    |-- year: string (nullable = true)
|    |-- month: string (nullable = true)
|    |-- day: string (nullable = true)
|-- end_date: struct (nullable = false)
|    |-- year: string (nullable = true)
|    |-- month: string (nullable = true)
|    |-- day: string (nullable = true)
scala>var df1= df.withColumn("start_date",date_format(concat_ws("-",col("start_date.year"),col("start_date.month"),col("start_date.day")),"yyyy/MM/dd")).withColumn("end_date",date_format(concat_ws("-",col("end_date.year"),col("end_date.month"),col("end_date.day")),"yyyy/MM/dd"))
scala> df1.show
+----------+----------+
|start_date|  end_date|
+----------+----------+
|2019/07/11|2019/08/12|
|2019/07/14|2019/08/13|
+----------+----------+

如果您有任何问题,请告诉我。

您可以使用to_date&&format_string||concact_ws的组合(嵌套(。通常,您可以使用数据帧函数实现所需内容的 90%

醒来后,我会提供更多细节。我住的地方已经很晚了...

更新:

data.withColumn("start_date_as_date", 
to_date(
concat_ws("/", $"start_date.year", $"start_date.month", $"start_date.day"), 
"yyyy/MM/dd")
).show
+-------------+-------------+------------------+
|   start_date|     end_date|start_date_as_date|
+-------------+-------------+------------------+
|  [776, 9, 1]| [2019, 9, 2]|        0776-09-01|
|[2019, 9, 18]|[2019, 9, 19]|        2019-09-18|
|[2019, 10, 1]|[2019, 10, 2]|        2019-10-01|
+-------------+-------------+------------------+
... .printSchema
root
|-- start_date: struct (nullable = true)
|    |-- year: integer (nullable = false)
|    |-- month: integer (nullable = false)
|    |-- day: integer (nullable = false)
|-- end_date: struct (nullable = true)
|    |-- year: integer (nullable = false)
|    |-- month: integer (nullable = false)
|    |-- day: integer (nullable = false)
|-- start_date_as_date: date (nullable = true)

或者,您也可以使用:

format_string("%02d/%02d/%02d", // this lets you get creative if you want!
$"start_date.year", $"start_date.month", $"start_date.day")`

最新更新