我想对销售数据做一个购物篮分析。为此,我需要两个值,order_id
和product_id
。
我只想要order_id
和product_id
,当每个都有重复的值不一定在同一行时,但我不想摆脱这些值,我想摆脱order_id和product_id,如果它们只在数据库中显示一次。应该是由order_id
订购的。结果应该如下所示
从这个
<表类>order_id product_id tbody><<tr>1 1 12 13 14 21 22 23 3 1 3 2 3 5 42 表类>
我相信有一个更优雅的解决方案,但如果我理解你的问题想要有order_id或product_id在DB中出现多次的行,那么我相信这将起作用:
编辑:如果你需要它们都在列表中有多个项目,那么我只需要将它们单独连接到子查询中:
SELECT t1.order_id, t1.product_id
FROM order_items t1
INNER JOIN (
SELECT ORDER_ID from order_items group by ORDER_ID having count(*) > 1) t2
ON t1.order_id = t2.order_id
INNER JOIN (
SELECT PRODUCT_ID from order_items group by PRODUCT_ID having count(*) > 1) t3
ON t1.product_id = t3.product_id
ORDER BY order_id
在这里使用几个窗口聚合可能效果最好:
WITH CTE AS(
SELECT order_id,
product_id,
COUNT(1) OVER (PARTITION BY order_id) AS orders,
COUNT(1) OVER (PARTITION BY product_id) AS products
FROM dbo.YourTable)
SELECT order_id,
product_id
FROM CTE
WHERE orders > 1
AND products > 1;
in
关键字可以使查询更简单,但我不确定性能…
select order_id, product_id from order_items
where order_id not in (
select order_id from order_items group by order_id having count(*) = 1
) and product_id not in (
select product_id from order_items group by product_id having count(*) = 1
);