如何用SQL表达Spark SQL的时间窗口函数



我有一个简单的DataFrame,它的模式是:

word: string
process_time: timestamp

我按时间窗口分组,并在分组的DataFrame:上计数

val windowedCount = wordsDs
.groupBy(
window($"processing_time", "15 seconds")
).count()

如何使用Spark SQL的语法将此代码移植到SQL?

这几乎是一对一的翻译:

spark.sql("""SELECT window(process_time, "15 seconds"), count(*) 
FROM wordDs 
GROUP BY window(process_time, "15 seconds")""")

或:

spark.sql("""WITH tmp AS(SELECT window(process_time, "15 seconds") w FROM wordDs)
SELECT w, count(*) FROM tmp GROUP BY w""")

最新更新