我知道这个问题已经被问过好几次了,我参考/搜索了提供的问题和答案。我还阅读了数据库文档,并尝试了几次,但我就是没有得到想要的结果。
给定:
+----------------------------+
| data_type |
+----------------------------+
| timestamp |
+----------------------------+
,
+------------------------------+
| data_value |
+------------------------------+
| 2017-11-22T00:00:00.000+0000 |
+------------------------------+
期望结果:
+----------------------------+
| data_value |
+----------------------------+
| 22.11.2017 |
+----------------------------+
到目前为止,我所尝试和失败的:
date_format(date_value, 'dd.mm.yyyy') AS MFGDate,
to_date(date_value) AS MFGDate,
date(date_value) AS MFGDate
结果:+-------------------------+------------+
| MFGDate | MFGDate | MFGDate |
+------------+------------+------------+
| 22.00.2017 | 2017-11-22 | 2017-11-22 |
+------------+------------+------------+
完整查询:
SELECT
'01 FUV' AS Stage,
d1.ps_name AS FUV,
d1.ps_name AS LOT,
d2.date_value AS MFGDate
FROM
table d1
INNER JOIN table d2 ON d1.ag_id = d2.ag_id
AND d1.ag_path = d2.ag_path
AND d1.ps_name = d2.ps_name
WHERE
d1.AG_PATH LIKE "sourcepath'
结果:+--------+--------+--------+------------------------------+
| Stage | FUV | Lot | MFGDate |
+--------+--------+--------+------------------------------+
| 01 FUV | A1U079 | A1U079 | 2019-03-27T00:00:00.000+0000 |
| 01 FUV | A1U255 | A1U255 | 2019-06-22T00:00:00.000+0000 |
| 01 FUV | A1U255 | A1U255 | 2019-11-10T00:00:00.000+0000 |
如何获取MFGDate列的值,格式为:22.11.2017 ?
您可以使用内置函数- date_format,您缺少的是正确的Symbol
用法。该链接解释了符号的用法
典型用法
input_list = [
(1,"2019-11-07 05:30:00")
,(2,"2019-07-09 15:30:00")
,(3,"2019-12-09 10:30:00")
,(4,"2019-02-11 14:30:00")
]
sparkDF = sql.createDataFrame(input_list,['id','date'])
sparkDF = sparkDF.withColumn('date',F.to_timestamp(F.col('date'), 'yyyy-MM-dd HH:mm:ss'))
sparkDF = sparkDF.withColumn('date_formated',F.date_format(F.col('date'), 'dd.MM.yyyy'))
sparkDF.show()
+---+-------------------+-------------+
| id| date|date_formated|
+---+-------------------+-------------+
| 1|2019-11-07 05:30:00| 07.11.2019|
| 2|2019-07-09 15:30:00| 09.07.2019|
| 3|2019-12-09 10:30:00| 09.12.2019|
| 4|2019-02-11 14:30:00| 11.02.2019|
+---+-------------------+-------------+
你很接近了。你可以使用内置函数- date_format,但是你得到"00"返回当月的值是因为您的格式不正确。您指定了"mm"它返回小时的分钟;你应该指定"MM"它返回一年中的月份。所以正确的代码是:
date_format(date_value, 'dd.MM.yyyy') AS MFGDate
文档在这里:https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html