在Spark中,周的月份看起来默认从星期日到星期六
spark.sql(""" select date_format("2022-07-10","W") as week,date_format("2022-07-16","W") as week2 """).show()
输出:3 3
我们不能从星期一到星期日计算一个月的星期吗?
spark.sql(""" select date_format("2022-07-10","W") as week """).show()
期望输出:2
我正在寻找一个Spark SQL的方式来实现这一点。
没有内置的方法。你可以使用这个表达式:
date_format(col_date, 'W') +
CASE weekday(date_trunc('MM', col_date))
WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
END
测试:
df = spark.createDataFrame([("2022-07-01",), ("2022-07-02",),
("2022-07-03",), ("2022-07-10",),
("2022-05-01",), ("2022-05-02",)],
["col_date"])
df.createOrReplaceTempView("table")
spark.sql(
"""
SELECT
col_date,
date_format(col_date, 'W') as week1,
(
date_format(col_date, 'W') +
CASE weekday(date_trunc('MM', col_date))
WHEN 6 THEN (CASE weekday(col_date) WHEN 6 THEN 0 ELSE 1 END)
ELSE (CASE weekday(col_date) WHEN 6 THEN -1 ELSE 0 END)
END
) as week2
FROM table
"""
).show()
# +----------+-----+-----+
# | col_date|week1|week2|
# +----------+-----+-----+
# |2022-07-01| 1| 1.0|
# |2022-07-02| 1| 1.0|
# |2022-07-03| 2| 1.0|
# |2022-07-10| 3| 2.0|
# |2022-05-01| 1| 1.0|
# |2022-05-02| 1| 2.0|
# +----------+-----+-----+
对于其他日子(例如星期二),下面的表达式更好:
date_format(col_date, 'W') +
CASE dayofweek(date_trunc('MM', col_date)) < 3
WHEN TRUE THEN (CASE dayofweek(col_date) < 3 WHEN TRUE THEN 0 ELSE 1 END)
ELSE (CASE dayofweek(col_date) < 3 WHEN TRUE THEN -1 ELSE 0 END)
END
3
为星期二4
为星期三5
为星期四6
为星期五7
为星期六
(数字需要修改3处)