我是spark的新手。当我使用add_months()
时,如果列是时间戳类型,它返回日期类型。如何保存hour:minute:seconds
格式?
df.where($"DEAL_ID" === "deal1" && $"POOL_ID" ==="pool_1")
.select("LVALID_DEAL_DATE","LAST_PROCESS_DATE")
.withColumn("test", add_months($"LAST_PROCESS_DATE", -3))
.show
输出| LVALID_DEAL_DATE| LAST_PROCESS_DATE| test|
|2016-05-01 00:00:...|2016-08-01 19:38:...|2016-05-01|
看起来add_months只支持日期类型。如果传入时间戳类型,则只返回日期部分。我使用unix_timestamp函数尝试了下面的代码,它将HH:mm:ss转换为00。
df.withColumn("New Dates",unix_timestamp(add_months(df("Dates"),1)).cast("timestamp")).show
我们可以在这里做一个技巧,将时间戳来回转换为unix_epoch,首先以毫秒为单位获取时间组件,然后将其添加回使用add_months
添加月份后获得的新日期假设我们在
下面有一个数据框架dfdf.show()
+---+
| id|
+---+
| 1|
| 2|
| 3|
| 4|
| 5|
+---+
import org.apache.spark.sql.types.DateType
val dfttimestamp = df.withColumn("StartDateTimeEpoch", lit(1573362092000L))
.withColumn("StartDateTimeStamp", to_utc_timestamp(to_timestamp(col("StartDateTimeEpoch")/1000), "UTC"))
.withColumn("StartDateTimeTruncated", unix_timestamp(col("StartDateTimeStamp").cast(DateType)) * 1000) //truncate time component by converting to Date
.withColumn("StartTimeMillisDiff", col("StartDateTimeEpoch") - col("StartDateTimeTruncated")) //get time component in millis
.withColumn("StartDate_NextYr", add_months(col("StartDateTimeStamp"),12)) //add 12 months to get next year, as Date column
.withColumn("StartDateTimeEpoch_NextYr", unix_timestamp(col("StartDate_NextYr")) * 1000 + col("StartTimeMillisDiff")) // conver Date to unix-timestamp and add the prevous calculated diff in millis
.withColumn("StartDateTimeStamp_NextYr", to_utc_timestamp(to_timestamp(col("StartDateTimeEpoch_NextYr")/1000), "UTC"))
dfttimestamp.show()
dfttimestamp.printSchema()
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
| id|StartDateTimeEpoch| StartDateTimeStamp|StartDateTimeTruncated|StartTimeMillisDiff|StartDate_NextYr|StartDateTimeEpoch_NextYr|StartDateTimeStamp_NextYr|
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
| 1| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 2| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 3| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 4| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 5| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
root
|-- id: integer (nullable = false)
|-- StartDateTimeEpoch: long (nullable = false)
|-- StartDateTimeStamp: timestamp (nullable = true)
|-- StartDateTimeTruncated: long (nullable = true)
|-- StartTimeMillisDiff: long (nullable = true)
|-- StartDate_NextYr: date (nullable = true)
|-- StartDateTimeEpoch_NextYr: long (nullable = true)
|-- StartDateTimeStamp_NextYr: timestamp (nullable = true)