随时间删除重复项



我们有一个"审核日志"表,每天存储每个项目的价格。下面是一些示例数据:

1 - Item 587 - $19.99 - 12/1/2012
2 - Item 587 - $19.99 - 12/2/2012
3 - Item 587 - $18.99 - 12/3/2012
4 - Item 587 - $18.99 - 12/4/2012
5 - Item 587 - $20.99 - 12/5/2012
6 - Item 587 - $19.99 - 12/6/2012
7 - Item 587 - $19.99 - 12/7/2012
8 - Item 587 - $17.99 - 12/8/2012
9 - Item 587 - $17.99 - 12/9/2012
10 - Item 587 - $17.99 - 12/10/2012

这工作正常,但是,此表增长太快。我们现在需要一个 T-SQL 脚本来删除重复的行。脚本后面的数据集应如下所示:

1 - Item 587 - $19.99 - 12/1/2012
3 - Item 587 - $18.99 - 12/3/2012
5 - Item 587 - $20.99 - 12/5/2012
6 - Item 587 - $19.99 - 12/6/2012
8 - Item 587 - $17.99 - 12/8/2012

似乎需要GROUP BYDISTINCT,但我不确定如何开始。

WITH DuplicateTableWithDupNumbers
as
(
    SELECT recid, itemid, price, date, 
        ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY  dateDESC ) AS DupNumber
    FROM YourTable
)
delete from YourTable
where EXISTS
(
    SELECT 1 
    from DuplicateTableWithDupNumbers as dup 
    WHERE DupNumber > 1 AND dup.recid = yourtable.recid
)

您可以根据需要更改分区首选项。这将在一个查询中直接从您的表中删除。

假设表中有这些值,则需要如下内容:

select id, t.item, t.price, date 
from table t inner join
(select item, price, min(date) as min_date
from table
group by item, price) g on 
t.item = g.item and
t.price = g.price
t.date = g.min_date

使用递归 СTE 检查相同的值

;WITH cte AS
 (
  SELECT Id, ItemId, Price, [date], 
         0 AS ChPrice
  FROM dbo.audit_log
  WHERE Id = 1
  UNION ALL
  SELECT a.Id, a.ItemId, a.Price, a.[date],
         CASE WHEN a.Price = c.Price THEN a.Id ELSE 0 END
  FROM dbo.audit_log a JOIN cte c ON a.Id = c.Id + 1  
  )
DELETE a
FROM dbo.audit_log a JOIN cte c ON a.Id = c.ChPrice

关于SQLFiddle的演示

最新更新