Spark 2.4中从星期一开始的星期



在Spark中,周的月份看起来默认从星期日到星期六

spark.sql(""" select date_format("2022-07-10","W") as week,date_format("2022-07-16","W") as week2 """).show()

输出:3 3

我们不能从星期一到星期日计算一个月的星期吗?

spark.sql(""" select date_format("2022-07-10","W") as week """).show()

期望输出:2

我正在寻找一个Spark SQL的方式来实现这一点。

没有内置的方法。你可以使用这个表达式:

date_format(col_date, 'W') +
CASE weekday(date_trunc('MM', col_date))
WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
END

测试:

df = spark.createDataFrame([("2022-07-01",), ("2022-07-02",),
("2022-07-03",), ("2022-07-10",),
("2022-05-01",), ("2022-05-02",)],
["col_date"])
df.createOrReplaceTempView("table")
spark.sql(
"""
SELECT
col_date,
date_format(col_date, 'W') as week1,
(
date_format(col_date, 'W') +
CASE weekday(date_trunc('MM', col_date))
WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
END
) as week2
FROM table
"""
).show()
# +----------+-----+-----+
# |  col_date|week1|week2|
# +----------+-----+-----+
# |2022-07-01|    1|  1.0|
# |2022-07-02|    1|  1.0|
# |2022-07-03|    2|  1.0|
# |2022-07-10|    3|  2.0|
# |2022-05-01|    1|  1.0|
# |2022-05-02|    1|  2.0|
# +----------+-----+-----+

对于其他日子(例如星期二),下面的表达式更好:

date_format(col_date, 'W') +
CASE dayofweek(date_trunc('MM', col_date)) < 3
WHEN TRUE THEN (CASE dayofweek(col_date) < 3 WHEN TRUE THEN 0 ELSE 1 END)
ELSE (CASE dayofweek(col_date) < 3 WHEN TRUE THEN -1 ELSE 0 END)
END

3为星期二
4为星期三
5为星期四
6为星期五
7为星期六

(数字需要修改3处)

相关内容

最新更新