如果我运行以下查询,它会产生1.6M行:
SELECT
customer_id,
credit_id,
date
FROM
wallet
WHERE
credit_title = 'Topups'
AND credit_type = 'Credit Card Topups'
AND day >= DATE '2017-11-06'
AND day <= DATE '2020-04-03'
我现在正试图为上述查询中的每一行获取该客户截至该行日期的所有信贷交易的计数和总额。我尝试过下面的查询(它本身就是一个联接(,但结果是130万行。为什么要在联接中删除行?credit_id
是该表中的唯一标识符。
SELECT
customer_id,
credit_id,
COALESCE(COUNT(wallet_agg.credit_id), 0) AS topup_count_to_date,
COALESCE(SUM(wallet_agg.credit_amt_usd), 0) AS topup_amount_to_date
FROM
wallet
LEFT JOIN
wallet AS wallet_agg
ON
wallet.customer_id = wallet_agg.customer_id
AND wallet_agg.date < wallet.date
WHERE
wallet.credit_title = 'Topups'
AND wallet.credit_type = 'Credit Card Topups'
AND wallet.day >= DATE '2017-11-06'
AND wallet.day <= DATE '2020-04-03'
AND wallet_agg.credit_title = 'Topups'
AND wallet_agg.credit_type = 'Credit Card Topups'
下面是我正在尝试的一个简单的演示,它得到了我所期望的结果。我上面更复杂的查询的逻辑有何不同?
您的JOIN
正被WHERE
条件转换为内部联接。您需要将第二个表上的条件移动到ON
子句中:
FROM wallet LEFT JOIN
wallet AS wallet_agg
ON wallet.customer_id = wallet_agg.customer_id AND
wallet_agg.date < wallet.date AND
wallet_agg.credit_title = 'Topups'
wallet_agg.credit_type = 'Credit Card Topups'
WHERE wallet.credit_title = 'Topups' AND
wallet.credit_type = 'Credit Card Topups' AND
wallet.day >= DATE '2017-11-06' AND
wallet.day <= DATE '2020-04-03'
当然,对于这个问题,聚合是一种过度杀伤。您应该只使用窗口功能:
SELECT w.*
FROM (SELECT w.customer_id, w.credit_id, w.date,
COUNT(*) OVER (PARTITION BY customer_id, credit_title, credit_type ORDER BY date) as topup_count_to_date,
SUM(amount) OVER (PARTITION BY customer_id, credit_title, credit_type ORDER BY date) as topup_amount_to_date
FROM wallet w
WHERE w.credit_title = 'Topups' AND
w.credit_type = 'Credit Card Topups'
) w
WHERE w.day >= DATE '2017-11-06' AND
w.day <= DATE '2020-04-03';