SQL:有效地将增量数字附加到字符串中,避免重复



我有一组记录(表[#tmp_origin](,其中包含字符串字段中的重复条目([名称](。我想将 [#tmp_origin] 的全部内容插入到目标表 [#tmp_destination] 中,该表不允许重复并且可能已经包含项目。

如果源表中的字符串在目标表中不存在,则 in 将按原样插入到目标表中。如果目标表中的条目已存在,其值与原始表中的条目值相同,则必须先在字符串后附加一个字符串化的增量数字,然后才能将其插入目标表中。

在此示例脚本中,以这种方式移动数据的过程已使用游标实现:

-- create initial situation (origin and destination table, both containing items)
-- Begin
CREATE TABLE [#tmp_origin] ([Names] VARCHAR(10))
CREATE TABLE [#tmp_destination] ([Names] VARCHAR(10))
CREATE UNIQUE INDEX [IX_UniqueName] ON [#tmp_destination]([Names] ASC)
INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
INSERT INTO [#tmp_origin]([Names]) VALUES ('b')
INSERT INTO [#tmp_origin]([Names]) VALUES ('c')
INSERT INTO [#tmp_destination]([Names]) VALUES ('a')
INSERT INTO [#tmp_destination]([Names]) VALUES ('a_1')
INSERT INTO [#tmp_destination]([Names]) VALUES ('b')
-- create initial situation - End
DECLARE @Name VARCHAR(10)
DECLARE NamesCursor CURSOR LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY FOR
    SELECT [Names]
    FROM [#tmp_origin];
OPEN NamesCursor;
FETCH NEXT FROM NamesCursor INTO @Name;
 
WHILE @@FETCH_STATUS = 0
BEGIN
    DECLARE @finalName VARCHAR(10)
    SET @finalName = @Name
    DECLARE @counter INT
    SET @counter = 1
    WHILE(1=1)
    BEGIN
        IF NOT EXISTS(SELECT * FROM [#tmp_destination] WHERE [Names] = @finalName)
            BREAK;
        SET @finalName = @Name + '_' + CAST(@counter AS VARCHAR)
        SET @counter = @counter + 1
    END
    INSERT INTO [#tmp_destination] ([Names]) (
        SELECT @finalName
    )
    FETCH NEXT FROM NamesCursor INTO @Name;
END
 
CLOSE NamesCursor;
DEALLOCATE NamesCursor;
SELECT *
FROM [#tmp_destination]
/*
Expected result:
a
a_1
a_2
a_3
b
b_1
c
*/
DROP TABLE [#tmp_origin]
DROP TABLE [#tmp_destination]

这工作正常,但当要插入的项目数增加时,其性能会大大降低。

有什么想法可以加快速度吗?

谢谢

使用窗口函数允许对重复项进行编号。您还可以从目标表中获取计数(需要 where 条件来去除您添加的后缀(:

select orig.names,
       row_number() over (partition by orig.names order by orig.names) as rowNo,
       dest.count
from ##tmp_origin orig
  cross apply (select count(1) from #tmp_destination where names = orig.names) as dest

可以从上面构建insert(如果大于零,则rowNo + dest.count -1新后缀(。

建议您重构目标临时表,将名称和后缀作为单独的列包含在内 - 这可能意味着有一个新的中间阶段 - 因为这将使匹配逻辑更简单。

像这样:

insert  [#tmp_destination]
select  CASE WHEN row_number() over(partition by Names order by Names) > 1 THEN Names + '_' + CONVERT(VARCHAR(10), row_number() over(partition by Names order by Names)) ELSE Names END
from    [#tmp_origin]
在这种情况下我

不会使用游标。相反,我会使用 ROW_NUMBER() 构建查询。这样,您可以在原始表中添加一个计数器,然后使用此计数器追加到您的 [名称]:

SELECT [Names], ROW_NUMBER() OVER (PARTITION BY [Names] ORDER BY [Names]) - 1 AS [counter]
INTO #tmp_origin_with_counter
FROM #tmp_origin
SELECT CONCAT([Names], IIF([counter] = 0, '', '_'+ CAST([counter] AS NVARCHAR)))
INTO #tmp_destination
FROM #tmp_origin_with_counter

最新更新