我有一个如下的数据集:
transaction_id | store_id | product_id | product_name | 1 | 100 | p001 | product_1 | 1
---|---|---|---|---|
100 | p002 | product_2 | ||
100 | p003 | product_3 | ||
100 | p002 | product_2 | ||
100 | p003 | product_3 | ||
100 | p002 | product_2 | ||
100 | p003 | product_3 | ||
100 | p001 | product_1 | ||
100 | p003 | product_3 | ||
101 | p002 | product_2 | ||
101 | p003 | product_3 | ||
9 | 101 | p001 | product_1 | |
9 | 101 | p002 | product_2 | |
101 | p001 | product_1 | ||
3 | 101 | p002 | product_2 | |
3 | 101 | p003 | product_3 | |
101 | p001 | product_1 | ||
101 | p002 | product_2 |
您可以尝试以下操作,它在基于商店和产品对进行聚合之前执行自连接。row_number
用于检索每个商店的前2个产品对。
SELECT
store_id, freq_prod_ids,count_of_transactions
FROM (
SELECT
t1.store_id,
CONCAT(t1.product_id,', ',t2.product_id) as freq_prod_ids,
COUNT(1) as count_of_transactions,
ROW_NUMBER() OVER (PARTITION BY t1.store_id ORDER BY COUNT(1) DESC) as rn
FROM my_table t1
INNER JOIN my_table t2 on t1.store_id = t2.store_id and
t1.product_id < t2.product_id and
t1.transaction_id = t2.transaction_id
GROUP BY t1.store_id,CONCAT(t1.product_id,', ',t2.product_id)
) t3 WHERE rn <=2
查看工作演示
让我知道这是否适合你。