如何在数据帧scala中获得日期差异(之间没有天数)?
我有一个df: [id: string, itemName: string, eventTimeStamp: timestamp]
和一个startTime(时间戳字符串)我如何得到一个列"Daydifference"-天之间(startTime - timestamp)
My Code:
初始df:
+------------+-----------+-------------------------+
| id | itemName | eventTimeStamp |
----------------------------------------------------
| 1 | TV | 2016-09-19T00:00:00Z |
| 1 | Movie | 2016-09-19T00:00:00Z |
| 1 | TV | 2016-09-26T00:00:00Z |
| 2 | TV | 2016-09-18T00:00:00Z |
我需要根据id和itemName获取最近的eventTimeStamp,所以我这样做了:
val result = df.groupBy("id", "itemName").agg(max("eventTimeStamp") as "mostRecent")
+------------+-----------+-------------------------+
| id | itemName | mostRecent |
----------------------------------------------------
| 1 | TV | 2016-09-26T00:00:00Z |
| 1 | Movie | 2016-09-19T00:00:00Z |
| 2 | TV | 2016-09-26T00:00:00Z |
现在我需要得到mostRecent和startTime (2016-09-29T00:00:00Z)之间的日期差,这样我就可以得到:
{ id : 1, {"itemMap" : {"TV" : 3, "Movie" : 10 }} }
{ id : 2, {"itemMap" : {"TV" : 3}} }
我试过了:
val startTime = "2016-09-26T00:00:00Z"
val result = df.groupBy("id", "itemName").agg(datediff(startTime, max("eventTimeStamp")) as Daydifference)
case class Data (itemMap : Map[String, Long]) extends Serializable
result.map{
case r =>
val id = r.getAs[String]("id")
val itemName = r.getAs[String]("itemName")
val Daydifference = r.getAs[Long]("Daydifference")
(id, Map(itemName -> Daydifference ))
}.reduceByKey((x, y) => x ++ y).map{
case (k, v) =>
(k, JacksonUtil.toJson(Data(v)))
}
但是在datediff上得到错误。谁能告诉我怎么做呢?
当你想在DataFrame中使用一些常量("literal")值作为Column
时,你应该使用lit(...)
函数。这里的另一个错误是试图使用字符串作为startDate
,将其与时间戳列进行比较,您可以使用java.sql.Date
:
val startTime = new java.sql.Date(2016, 8, 26) // beware, months are Zero-based
val result = df.groupBy("id", "itemName")
.agg(datediff(lit(startTime), max("eventTimeStamp")) as "Daydifference")
result.show()
// +---+--------+-------------+
// | id|itemName|Daydifference|
// +---+--------+-------------+
// | 1| Movie| 7|
// | 1| TV| 0|
// | 2| TV| 0|
// +---+--------+-------------+