我得到了一个包含一些列的表,如
ID RecordID DateInserted
1 10 now + 1
2 10 now + 2
3 4 now + 3
4 10 now + 4
5 10 now + 5
当RecordID列按DateInserted 排序时,我想删除所有不连续的重复项
在我的例子中,我想删除记录4和5,因为在2和4之间有一个id不同的记录。
有没有一种方法可以用1个查询来完成?
您可以使用窗口函数。一种方法是计算每一行发生的值变化,只取有一个变化的行:
select t.*
from (select t.*,
sum(case when prev_recordid = recordid then 0 else 1 end) over (order by dateinserted) as grp_num
from (select t.*,
lag(recordid) over (order by dateinserted) as prev_recordid
from t
) t
) t
where grp_num = 1;
一种方法是"标志";该CCD_ 1第一次出现时为而不是且前一行的所有行都包含不同的CCD_ 2。然后,您只需为该RecordID
排除超出该点的任何行。
;WITH cte AS
(
SELECT ID, RecordID, DateInserted,
dr = DENSE_RANK() OVER (PARTITION BY RecordID ORDER BY DateInserted),
prior = COALESCE(LAG(RecordID,1) OVER (ORDER BY DateInserted), RecordID)
FROM dbo.table_name
),
FlaggedRows AS
(
SELECT RecordID, dr
FROM cte
WHERE dr > 1 AND prior <> RecordID
)
SELECT cte.ID, cte.RecordID, cte.DateInserted
FROM cte
LEFT OUTER JOIN FlaggedRows AS f
ON cte.RecordID = f.RecordID
WHERE cte.dr < COALESCE(f.dr, cte.dr + 1)
ORDER BY cte.DateInserted;
如果您想从源中实际删除行(remove通常被推断为从结果中移除),则将末尾的SELECT
更改为:
DELETE cte
FROM cte
INNER JOIN FlaggedRows f
ON cte.RecordID = f.RecordID
WHERE cte.dr >= f.dr;