我有一张关于客户及其购买行为的表格,如下所示:
customer shop time
----------------------------
1 5 13.30
1 5 14.33
1 10 22.17
2 3 12.15
2 1 13.30
2 1 15.55
2 3 17.29
由于我想在商店里换班,我需要以下输出
customer shop time
----------------------------
1 5 13.30
1 10 22.17
2 3 12.15
2 1 13.30
2 3 17.29
我试过使用
ROW_NUMBER() OVER (PARTITION BY customer, shop ORDER BY time ASC) AS a counter
然后只保留所有的CCD_ 1。然而,当客户稍后再次访问同一家商店时,这会困扰我,就像我的示例中的customer=2
和shop=3
一样。
我想出了这个:
WITH a AS
(
SELECT
customer, shop, time,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY time ASC) AS counter
FROM
db
)
SELECT a1.*
FROM a a1
JOIN a AS a2 ON (a1.device = a2.device AND a2.counter1 + 1 = a1.counter1 AND a2.id <> a1.id)
UNION
SELECT a.*
FROM a
WHERE counter1 = 1
然而,这是非常低效的,在我的数据所在的AWS中运行它会导致一个错误,告诉我
以这个比例因子查询耗尽的资源
有什么方法可以提高查询效率吗?
这是一个缺口和孤岛问题。但最简单的解决方案使用lag()
:
select customer, shop, time
from (select t.*, lag(shop) over (partition by customer order by time) as prev_shop
from t
) t
where prev_shop is null or prev_shop <> shop;