TSQL标记具有单个唯一id的重复记录



必须有更好的方法来做到这一点。

我有一个表,我标记了所有独特的记录,如下所示:

    WITH
    CTE( TransId, OriginalName, StrippedName, RowNumber )
    AS
    (
        SELECT  TransID ,
                Name ,
                StrippedName,
                RN = ROW_NUMBER() OVER ( PARTITION BY StrippedName ORDER BY StrippedName )
        FROM    dbo.Members
    )
UPDATE  dbo.Members
SET ParenId = TransID
WHERE   TransID IN ( SELECT TransId FROM CTE WHERE RowNumber = 1 )

现在,我想将所有重复的记录(其中RowNumber>1)更新为唯一的ParentId。现在我正在使用一个UDF,它很有效,但它需要很长时间。这是UDF:

CREATE FUNCTION dbo.getParentTransId ( @TransId INT, @strippedBusName VARCHAR(200) )
RETURNS INT
AS
BEGIN
    DECLARE @ParentTransId INT
    SELECT  @ParentTransId = TransId
    FROM    dbo.Members
    WHERE   StrippedBusName = @strippedBusName
      AND   ParenId IS NOT NULL
    IF  @ParentTransId IS NULL
    BEGIN
        SET @ParentTransId = @TransId
    END
    RETURN @ParentTransId
END
GO

这是我目前正在使用的更新语句:

UPDATE  dt
SET dt.ParenId = dbo.getParentTransId ( dt.TransID, dt.StrippedBusName )
FROM    dbo.Members dt
GO

有没有一种方法可以在不使用UDF的情况下进行相同的更新?

下面是一个测试脚本,显示@gotqn的答案有效:

-- create table
CREATE TABLE DupUpdateTest
(
    Id      INT NOT NULL PRIMARY KEY IDENTITY( 1, 1 ),
    TransId INT NOT NULL,
    OriginalName    VARCHAR(MAX),
    StrippedName    VARCHAR(MAX),
    ParentId    INT NULL
)
GO
-- insert data
INSERT  INTO dbo.DupUpdateTest VALUES (2, 'name 1', 'name1', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (3, 'name 1', 'name1', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (6, 'name 1', 'name1', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (8, 'name 2', 'name2', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (11, 'name 2', 'name2', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (15, 'name 55', 'name55', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (23, 'name 55', 'name55', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (99, 'name 100', 'name100', NULL)
INSERT  INTO dbo.DupUpdateTest VALUES (122, 'name 108', 'name108', NULL)
-- update parent id match
UPDATE dt
SET dt.ParentId = COALESCE(dt1.TransId, dt.TransId ) -- Returns the first nonnull expression among its arguments
FROM  dbo.DupUpdateTest dt
LEFT JOIN dbo.DupUpdateTest dt1
    ON dt.StrippedName   = dt1.StrippedName
-- view results
SELECT * FROM dbo.DupUpdateTest

我想这应该有效:

UPDATE  dbo.Members dt
SET dt.ParenId = COALESCE( (SELECT TrandsID FROM dbo.Members WHERE  dt.StrippedBusName   = StrippedBusName  AND   ParenId IS NOT NULL), dt.TransId )

我不确定,但以下方式应该更快:

UPDATE  dbo.Members dt
SET dt.ParenId = COALESCE(dt1.TransId, dt.TransId ) 
FROM  dbo.Members dt
LEFT JOIN dbo.Members dt1
    ON dt.StrippedBusName   = dt1.StrippedBusName
    AND  dt1.ParenId IS NOT NULL

最新更新