聚合表中截至表中每行日期的数据

  • 本文关键字:日期 数据 sql presto
  • 更新时间 :
  • 英文 :


如果我运行以下查询,它会产生1.6M行:

SELECT 
customer_id,
credit_id,
date
FROM 
wallet
WHERE
credit_title = 'Topups'
AND credit_type = 'Credit Card Topups'
AND day >= DATE '2017-11-06'
AND day <= DATE '2020-04-03'

我现在正试图为上述查询中的每一行获取该客户截至该行日期的所有信贷交易的计数和总额。我尝试过下面的查询(它本身就是一个联接(,但结果是130万行。为什么要在联接中删除行?credit_id是该表中的唯一标识符。

SELECT
customer_id,
credit_id,
COALESCE(COUNT(wallet_agg.credit_id), 0) AS topup_count_to_date,
COALESCE(SUM(wallet_agg.credit_amt_usd), 0) AS topup_amount_to_date
FROM
wallet
LEFT JOIN
wallet AS wallet_agg
ON
wallet.customer_id = wallet_agg.customer_id
AND wallet_agg.date < wallet.date
WHERE
wallet.credit_title = 'Topups'
AND wallet.credit_type = 'Credit Card Topups'
AND wallet.day >= DATE '2017-11-06'
AND wallet.day <= DATE '2020-04-03'
AND wallet_agg.credit_title = 'Topups'
AND wallet_agg.credit_type = 'Credit Card Topups'

下面是我正在尝试的一个简单的演示,它得到了我所期望的结果。我上面更复杂的查询的逻辑有何不同?

您的JOIN正被WHERE条件转换为内部联接。您需要将第二个表上的条件移动到ON子句中:

FROM wallet LEFT JOIN
wallet AS wallet_agg
ON wallet.customer_id = wallet_agg.customer_id AND
wallet_agg.date < wallet.date AND
wallet_agg.credit_title = 'Topups'
wallet_agg.credit_type = 'Credit Card Topups'
WHERE wallet.credit_title = 'Topups' AND
wallet.credit_type = 'Credit Card Topups' AND
wallet.day >= DATE '2017-11-06' AND
wallet.day <= DATE '2020-04-03'

当然,对于这个问题,聚合是一种过度杀伤。您应该只使用窗口功能:

SELECT w.*
FROM (SELECT w.customer_id, w.credit_id, w.date,
COUNT(*) OVER (PARTITION BY customer_id, credit_title, credit_type ORDER BY date) as topup_count_to_date,
SUM(amount) OVER (PARTITION BY customer_id, credit_title, credit_type ORDER BY date) as topup_amount_to_date
FROM wallet w
WHERE w.credit_title = 'Topups' AND
w.credit_type = 'Credit Card Topups' 
) w
WHERE w.day >= DATE '2017-11-06' AND
w.day <= DATE '2020-04-03';

相关内容

  • 没有找到相关文章

最新更新