根据多个条件过滤时间序列数据



我有一个存储时间序列数据的表:

tbody> <<tr>1221122
customer_id transaction_type transaction_date transaction_value
12022-12-04100.0
销售2022-12-0480.0
2022-12-04120.0
销售2022-12-03120.0
2022-12-0290.0
销售2022-12-0270.0
2022-12-01110.0
销售2022-12-01110.0

您可以使用带有row_number的CTE并检查最近的事务

WITH CTE as (SELECT
"customer_id", "transaction_type", "transaction_date",
"transaction_value",
ROW_NUMBER() OVER(PARTITION BY "customer_id", "transaction_type" ORDER BY  "transaction_date" DESC) rn
FROM tab1)
SELECT "customer_id", "transaction_type", "transaction_date",
"transaction_value" FROM CTE
WHERE rn = 1 
AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90) 
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100) 
ELSE FALSE END 
AND (SELECT COUNT(*) FROM CTE c1 
WHERE c1."customer_id"= CTE."customer_id" and rn = 1
AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90) 
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100) 
ELSE FALSE END ) = 2
transaction_value2022-12-04120.02022-12-03120.0

使用distinct on与自定义订单根据您的几个标准(因此OR) -latestCTE选择每个客户的所有最新交易,然后使用count作为窗口函数-latest_with_countCTE计算每个用户的结果记录数量,最后选择那些计数等于标准数量的记录,即所有标准都得到尊重。
这可能是一个有点冗长和抽象的模板,但希望能帮助解决一般问题。这个想法适用于任何情况。

with t as
(
/*
your query here with several conditions in DISJUNCTION (OR) here, i.e.
WHERE (customer_transactions.transaction_type = 'buy'  AND customer_transactions.transaction_value > 90)
OR (customer_transactions.transaction_type = 'sell' AND customer_transactions.transaction_value > 100)
*/
),
latest as 
(
select distinct on (customer_id, transaction_type) *
from t
-- pick the latest per customer & type
order by customer_id, transaction_type, transaction_date desc
),
latest_with_count as
(
select *, count(*) over (partition by customer_id) cnt
from latest
)
select * 
from latest_with_count
where cnt = 2 -- the number of criteria

最新更新