查找与类别交互的重复用户数



我是SQL的新手,正在尝试弄清楚以下内容。想象一下下表:

user_id, category_id
1, 12344
1, 12344
1, 12345
2, 12345
2, 12345
3, 12344
3, 12344

等等..我想找到每个类别获得的重复用户数量。

所以,在上面的例子中..

12344, 2 (because user_id 1 and 3 are repeated users)
12345, 1 (user_id 2 is repeated user.. 1 is not as that user visited just once)

我如何在sql/hive中解决这个问题?

例如

DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,user_id INT NOT NULL
,category_id INT NOT NULL
);
INSERT INTO my_table (user_id,category_id) VALUES
(1, 12344),
(1, 12344),
(1, 12345),
(2, 12345),
(2, 12345),
(3, 12344),
(3, 12344);
SELECT category_id
     , COUNT(*) total 
  FROM 
     ( SELECT x.* 
         FROM my_table x 
         JOIN my_table y 
           ON y.user_id = x.user_id 
          AND y.category_id = x.category_id 
          AND y.id < x.id
     ) a 
 GROUP 
    BY category_id;
+-------------+-------+
| category_id | total |
+-------------+-------+
|       12344 |     2 |
|       12345 |     1 |
+-------------+-------+

遵循您要查找的内容有点困难,但请测试一下:

select category_id, count(user_id) from (Select category_id, user_id, count(table_primary_id) as 'total' from tablename group by category_id, user_id) a where total > 1 group by category_id
子查询计算用户访问某个类别

的次数,外部查询应计算多次访问某个类别的用户数。

最新更新