识别 SQL 中的重复记录以及主键



我有一个业务案例场景,我需要查找我们的SQL"用户"表以找出重复的电子邮件地址。我能够通过以下查询做到这一点:

SELECT
user_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
user_email
HAVING 
COUNT(*) > 1
ORDER BY 
DuplicateEmails DESC

我得到这样的输出:

user_email      DuplicateEmails  
--------------------------------
abc@gmail.com   2
xyz@yahoo.com   3

现在,我被要求在自己的一行中列出所有重复的记录,并显示一些其他属性,例如名字,姓氏和用户ID。所有这些信息都存储在此表"用户"中。我很难这样做。谁能帮助我或让我朝着正确的方向前进?

我的输出需要如下所示:

user_email      DuplicateEmails  FirstName      LastName       UserID
------------------------------------------------------------------------------
abc@gmail.com   2                Tim            Lentil         timLentil
abc@gmail.com   2                John           Doe            johnDoe12
xyz@yahoo.com   3                brian          boss           brianTheBoss
xyz@yahoo.com   3                Thomas         Hood           tHood
xyz@yahoo.com   3                Mark           Brown          MBrown12

有几种方法可以做到这一点。这是一个使用 cte。

with FoundDuplicates as
(
SELECT
uter_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
uter_email
HAVING 
COUNT(*) > 1
)
select fd.user_email
, fd.DuplicateEmails  
, u.FirstName      
, u.LastName       
, u.UserID
from Users u
join FoundDuplicates fd on fd.uter_email = u.uter_email
ORDER BY fd.DuplicateEmails DESC

使用count() over( Partition by ), 示例

你可以像这样解决它:

DECLARE @T TABLE
(
UserID VARCHAR(20),
FirstName NVARCHAR(45),
LastName NVARCHAR(45),
UserMail VARCHAR(45)
);
INSERT INTO @T (UserMail, FirstName, LastName, UserID) VALUES    
('abc@gmail.com', 'Tim',         'Lentil',         'timLentil'),
('abc@gmail.com', 'John',         'Doe',         'johnDoe12'),
('xyz@yahoo.com', 'brian',         'boss',         'brianTheBoss'),
('xyz@yahoo.com', 'Thomas',         'Hood',         'tHood'),
('xyz@yahoo.com', 'Mark',         'Brown',         'MBrown12');
SELECT *, COUNT (1) OVER (PARTITION BY UserMail) MailCount
FROM @T;

结果:

+--------------+-----------+----------+---------------+-----------+
|    UserID    | FirstName | LastName |   UserMail    | MailCount |
+--------------+-----------+----------+---------------+-----------+
| timLentil    | Tim       | Lentil   | abc@gmail.com |         2 |
| johnDoe12    | John      | Doe      | abc@gmail.com |         2 |
| brianTheBoss | brian     | boss     | xyz@yahoo.com |         3 |
| tHood        | Thomas    | Hood     | xyz@yahoo.com |         3 |
| MBrown12     | Mark      | Brown    | xyz@yahoo.com |         3 |
+--------------+-----------+----------+---------------+-----------+

使用如下所示的窗口函数:

SELECT u.*
FROM (SELECT u.*, COUNT(*) OVER (PARTITION BY user_email) as numDuplicateEmails
FROM Users
) u
WHERE numDuplicateEmails > 1
ORDER BY numDuplicateEmails DESC;

我认为这也行得通。

WITH cte (
SELECT
* 
,DuplicateEmails = ROW_NUMBER() OVER (Partition  BY user_email ORder by user_email)
FROM Users
)
Select * from CTE 
where DuplicateEmails > 1

最新更新