我在SQL Server中有许多列(NOT NULL)
CREATE TABLE Employee (
[ID] INT identity(1,1),
[FirstName] Varchar(100),
[LastName] Varchar(100),
[Country] Varchar(100),
[ColumnX] Varchar(100),
[ColumnY] Varchar(100),
)
Insert into Employee ([FirstName],[LastName],[Country],[ColumnX],[ColumnY] )
values('Raj','Gupta','India','','11'),
('Raj','Gupta','India','bb',''),
('Mohan','Kumar','USA','',''),
('James','Barry','UK','',''),
('James','Barry','UK','','22'),
('James','Barry','UK','aa','');
SELECT * FROM Employee;
我想删除重复的行。删除重复项并添加/合并值后的表
FirstName | LastName | Country | ColumnX | ColumnY | Raj | Gupta | 印度 | bb | 11 |
---|---|---|---|---|
Mohan | Kumar | 美国 | ||
巴里 | 英国 | aa | 22 |
看来你把这事弄得太复杂了。你只需要简单的聚合,MAX
和COUNT
。
SELECT
e.FirstName,
e.LastName,
e.Country,
ColumnX = MAX(e.ColumnX),
ColumnX = MAX(e.ColumnY),
COUNT(*) AS DuplicateCount
FROM Employee e
GROUP BY
e.FirstName,
e.LastName,
e.Country;
,db<的在小提琴
注意空字符串排序最低,所以MAX
不会返回,除非没有其他字符串。
如果你真的想从基表中删除行,那么你可以在CTE中使用窗口函数,并将其连接到MERGE
中的虚拟表
WITH Target AS (
SELECT
e.ID,
e.ColumnX,
e.ColumnY,
NewColumnX = MAX(e.ColumnX) OVER (PARTITION BY e.FirstName, e.LastName, e.Country),
NewColumnY = MAX(e.ColumnY) OVER (PARTITION BY e.FirstName, e.LastName, e.Country),
rn = ROW_NUMBER() OVER (PARTITION BY e.FirstName, e.LastName, e.Country ORDER BY e.ID)
FROM Employee e
)
MERGE Target t
USING (VALUES(1)) s(dummy)
ON 1=1
WHEN MATCHED AND t.rn > 1
THEN DELETE
WHEN MATCHED THEN UPDATE
SET ColumnX = t.NewColumnX,
ColumnY = t.NewColumnY
;