用例-我试图从数据集中找到客户的每周频率。现在,并不是所有的客户都有"事件"发生在所有的几周内,我需要在"计数"列中用零值填充它们。
我试图使用PrestoSQL的序列函数来实现这一点。然而,这需要我从客户的订单中获得最长周的价值(我不想硬编码,因为结果会进入BI工具,我不想每周手动更新(
with all_orders_2020 as (select customer, cast(date_parse(orderdate, '%Y-%m-%d') as date) as order_date
from orders
where orderdate > '2020-01-01' and customer in (select customer from some_customers)),
orders_with_week_number as (select *, week(order_date) as week_number from all_orders_2020),
weekly_count as (select customer, week_number, count(*) as ride_count from orders_with_week_number
where customer = {{some_customer}} group by customer, week_number)
SELECT
week_number
FROM
(VALUES
(SEQUENCE(1,(select max(week_number) from weekly_count)))
) AS t1(week_array)
CROSS JOIN
UNNEST(week_array) AS t2(week_number)
Presto抱怨这句话-
Unexpected subquery expression in logical plan: (SELECT "max"(week_number)
FROM
weekly_count
)
有什么线索可以做到这一点吗?
有一个类似的用例,并遵循下面的示例:https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html
取出SEQUENCE并使用WITH子句定义子查询:
WITH dataset AS (
SELECT SEQUENCE(1, (SELECT MAX(week_number) FROM weekly_count)) AS week_array
)
SELECT week_number FROM dataset
CROSS JOIN UNNEST(week_array) as t(week_number)