我有一个联系人记录表,我正在尝试获取每个日期创建的重复记录的计数。我不希望将原始实例包括在计数中。我正在使用SQL Server。下面是的示例表
| email | created_on |
| ------------- | ---------- |
| aaa@email.com | 08-16-22 |
| bbb@email.com | 08-16-22 |
| zzz@email.com | 08-16-22 |
| bbb@email.com | 07-12-22 |
| aaa@email.com | 07-12-22 |
| zzz@email.com | 06-08-22 |
| aaa@email.com | 06-08-22 |
| bbb@email.com | 04-21-22 |
我希望归还
| created_on | dupe_count |
| ---------- | ---------- |
| 08-16-22 | 3 |
| 07-12-22 | 2 |
| 06-08-22 | 0 |
| 04-21-22 | 0 |
已编辑以添加错误消息:错误消息
我基于电子邮件创建了一个子表,并创建了日期行号。然后,您查询它,并忽略第一次创建电子邮件的日期(第1行(。在这种情况下效果非常好。
完整代码:
Create table #Temp
(
email varchar(50),
dateCreated date
)
insert into #Temp
(email, dateCreated) values
('aaa@email.com', '08-16-22'),
('bbb@email.com', '08-16-22'),
('zzz@email.com', '08-16-22'),
('bbb@email.com', '07-12-22'),
('aaa@email.com', '07-12-22'),
('zzz@email.com', '06-08-22'),
('aaa@email.com', '06-08-22'),
('bbb@email.com', '04-21-22')
select datecreated, sum(case when r = 1 then 0 else 1 end) as duplicates
from
(
Select email, datecreated, ROW_NUMBER() over(partition by email
order by datecreated) as r from #Temp
) b
group by dateCreated
drop table #Temp
输出:
datecreated duplicates
2022-04-21 0
2022-06-08 0
2022-07-12 2
2022-08-16 3
您可以计算每天的电子邮件总数与当天唯一电子邮件数之间的差异:
select created_on,
count(email) - count(distinct email) as dupe_count
from cte
group by created_on
我似乎误解了你的请求,你也想考虑以前的创建日期:
ct as (
select created_on,
(select case when (select count(*)
from cte t2
where t1.email = t2.email and t1.created_on > t2.created_on
) > 0 then email end) as c
from cte t1)
select created_on,
count(distinct c) as dupe_count
from ct
group by created_on
order by 1
在oracle中,似乎也可以使用一个查询进行聚合:
select created_on,
count(distinct case when (select count(*)
from cte t2
where t1.email = t2.email and t1.created_on > t2.created_on
) > 0 then email end) as c
from cte t1
group by created_on
order by 1