当spark dataframe做add_months(),如果列是时间戳类型,但返回日期类型,我怎么能保持小时:分钟:



我是spark的新手。当我使用add_months()时,如果列是时间戳类型,它返回日期类型。如何保存hour:minute:seconds格式?

df.where($"DEAL_ID" === "deal1" && $"POOL_ID" ==="pool_1")
  .select("LVALID_DEAL_DATE","LAST_PROCESS_DATE")
  .withColumn("test", add_months($"LAST_PROCESS_DATE", -3))
  .show

输出
|    LVALID_DEAL_DATE|   LAST_PROCESS_DATE|      test|
|2016-05-01 00:00:...|2016-08-01 19:38:...|2016-05-01|

看起来add_months只支持日期类型。如果传入时间戳类型,则只返回日期部分。我使用unix_timestamp函数尝试了下面的代码,它将HH:mm:ss转换为00。

 df.withColumn("New Dates",unix_timestamp(add_months(df("Dates"),1)).cast("timestamp")).show

我们可以在这里做一个技巧,将时间戳来回转换为unix_epoch,首先以毫秒为单位获取时间组件,然后将其添加回使用add_months

添加月份后获得的新日期

假设我们在

下面有一个数据框架df
df.show()
+---+
| id|
+---+
|  1|
|  2|
|  3|
|  4|
|  5|
+---+

import org.apache.spark.sql.types.DateType
val dfttimestamp = df.withColumn("StartDateTimeEpoch", lit(1573362092000L))
                                                .withColumn("StartDateTimeStamp", to_utc_timestamp(to_timestamp(col("StartDateTimeEpoch")/1000), "UTC"))
                                                .withColumn("StartDateTimeTruncated", unix_timestamp(col("StartDateTimeStamp").cast(DateType)) * 1000) //truncate time component by converting to Date
                                                .withColumn("StartTimeMillisDiff", col("StartDateTimeEpoch") - col("StartDateTimeTruncated")) //get time component in millis
                                                .withColumn("StartDate_NextYr", add_months(col("StartDateTimeStamp"),12)) //add 12 months to get next year, as Date column
                                                .withColumn("StartDateTimeEpoch_NextYr", unix_timestamp(col("StartDate_NextYr")) * 1000 + col("StartTimeMillisDiff")) // conver Date to unix-timestamp and add the prevous calculated diff in millis
                                                .withColumn("StartDateTimeStamp_NextYr", to_utc_timestamp(to_timestamp(col("StartDateTimeEpoch_NextYr")/1000), "UTC"))

dfttimestamp.show()
dfttimestamp.printSchema()

+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
| id|StartDateTimeEpoch| StartDateTimeStamp|StartDateTimeTruncated|StartTimeMillisDiff|StartDate_NextYr|StartDateTimeEpoch_NextYr|StartDateTimeStamp_NextYr|
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
|  1|     1573362092000|2019-11-10 05:01:32|         1573344000000|           18092000|      2020-11-10|            1604984492000|      2020-11-10 05:01:32|
|  2|     1573362092000|2019-11-10 05:01:32|         1573344000000|           18092000|      2020-11-10|            1604984492000|      2020-11-10 05:01:32|
|  3|     1573362092000|2019-11-10 05:01:32|         1573344000000|           18092000|      2020-11-10|            1604984492000|      2020-11-10 05:01:32|
|  4|     1573362092000|2019-11-10 05:01:32|         1573344000000|           18092000|      2020-11-10|            1604984492000|      2020-11-10 05:01:32|
|  5|     1573362092000|2019-11-10 05:01:32|         1573344000000|           18092000|      2020-11-10|            1604984492000|      2020-11-10 05:01:32|
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
root
 |-- id: integer (nullable = false)
 |-- StartDateTimeEpoch: long (nullable = false)
 |-- StartDateTimeStamp: timestamp (nullable = true)
 |-- StartDateTimeTruncated: long (nullable = true)
 |-- StartTimeMillisDiff: long (nullable = true)
 |-- StartDate_NextYr: date (nullable = true)
 |-- StartDateTimeEpoch_NextYr: long (nullable = true)
 |-- StartDateTimeStamp_NextYr: timestamp (nullable = true)

相关内容

  • 没有找到相关文章

最新更新