选择不同的customer_id

我想在article_id级别上计算每个store_id：

有多少共享article_id's分别在store_A和store_B首先到达。
如果arrival_timestamp例如article_id=2对于store_A

请参阅以下示例：

主表


arrival_timestamp           article_id   store_id
2019-04-01 11:04             2            A
2019-04-01 13:12             2            B
2019-04-01 08:24             4            A
2019-04-01 10:24             4            B
2019-04-10 07:00             7            A
2019-04-10 10:14             7            B
2019-04-23 07:34             9            A
2019-04-23 05:52             9            B

输出表


storeA_count_first_articles     storeB_count_first_articles
3                                1

可以使用两个级别的聚合：

select
sum(case when arrival_timestamp_a < arrival_timestamp_b then 1 else 0 end) storeA_count_first_articles,
sum(case when arrival_timestamp_b < arrival_timestamp_a then 1 else 0 end) storeB_count_first_articles
from (
select 
article_id,
min(case when store_id = 'A' then arrival_timestamp end) arrival_timestamp_a,
min(case when store_id = 'B' then arrival_timestamp end) arrival_timestamp_b
from mytable
group by article_id
) t

子查询使用条件聚合来计算 eacn 存储中每件商品的首次到达日期。然后，外部查询比较每篇文章的第一个到达时间戳并生成最终结果。

另一个选项使用row_number()，它避免了子查询中的条件逻辑和聚合：

select 
sum(case when store_id = 'A' then 1 else 0 end) storeA_count_first_articles,
sum(case when store_id = 'B' then 1 else 0 end) storeB_count_first_articles
from (
select 
t.*, 
row_number() over(partition by article_id order by arrival_timestamp) rn
from mytable t
) t
where rn = 1

我不熟悉 Presto，但我认为这应该基于他们的文档工作。此答案是常规解决方案，无需在查询中专门命名商店 A 和商店 B。

SELECT
q.first_store_id AS store_id,
COUNT(*) AS count_first_articles
FROM
(
SELECT
article_id,
first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
FROM
table
GROUP BY
article_id
) AS q
GROUP BY
first_store_id

这适用于任意数量的store_id值，而无需手动定义每一列 - 并且由于结果是面向行而不是面向列的，因此它们更容易在应用程序代码中处理。如果您仍然想要命名列，您可以在外部查询中执行此操作或使用PIVOT/UNPIVOT(嗯，显然 Presto 还不支持 PIVOT - 但您仍然可以在应用程序代码中执行此操作(

你会得到这样的结果：

store_id        count_first_articles
A                            3
B                            1

神奇之处在于first_value这是一个窗口功能，Presto内置了一组不错的窗口功能。

要将基于行的结果转换为原始的基于列的示例输出，请执行以下操作：

SELECT
SUM( CASE WHEN q2.store_id = 'A' THEN q2.count_first_articles END ) AS storeA_count_first_articles,
SUM( CASE WHEN q2.store_id = 'B' THEN q2.count_first_articles END ) AS storeB_count_first_articles
FROM
(
SELECT
q.first_store_id AS store_id,
COUNT(*) AS count_first_articles
FROM
(
SELECT
article_id,
first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
FROM
table
GROUP BY
article_id
) AS q
GROUP BY
first_store_id
) AS q2

给：

storeA_count_first_articles     storeB_count_first_articles
3                                1

虽然这个答案表面上比其他答案更复杂(嗯，更嵌套(，但它是一个通用解决方案，当您想查看除'A'和'B'之外的更多商店时，不需要修改。

可以使用两个级别的聚合。一种方法是：

select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
sum(case when first_store_id = 'B' then 1 else 0 end) as first_b       
from (select distinct article_id,
first_value(store_id) over (partition by article_id order by arrival_timestamp) as first_store_id
from t
) t;

注意：为方便起见，内部聚合使用select distinct。外部聚合不使用group by因为结果集中只需要一行。

这也可以使用min_by()和显式聚合在 Presto 中编写：

select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
sum(case when first_store_id = 'B' then 1 else 0 end) as first_b       
from (select article_id, min_by(store_id, arrival_timestamp) as first_store_id
from t
group by article_id
) t;

注意：这两个查询都假定您没有其他商店。如果您这样做并且只关心这两个，请向查询添加where store_id in ('A', 'B')。

相关内容

最新更新

热门标签：