使用下表定义:
CREATE TABLE dbo.Test
(
[ID] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
[Code] CHAR(10) NOT NULL UNIQUE NONCLUSTERED
);
为了对数据集进行去标识化,我需要随机化代码值。
我想保留 ID 值,以便其他表也不需要更改。
我尝试了以下方法:
SELECT t.ID, c.Code
FROM dbo.Test AS t
CROSS APPLY (SELECT TOP(1) v.Code
FROM dbo.Test AS v
WHERE v.Code <> t.Code
ORDER BY NEWID()) AS c
问题是将相同的唯一键值分配给多个记录。
关于如何解决这个问题的任何建议?
这是我的做法:
-- Sample data
DECLARE @test TABLE (ID INT IDENTITY, [Code] CHAR(10));
INSERT @test
VALUES (REPLICATE('A',10)),(REPLICATE('B',10)),(REPLICATE('C',10)),(REPLICATE('D',10));
-- Solution
SELECT t.ID, c.[Code]
FROM @test AS t
JOIN
(
SELECT SK = ROW_NUMBER() OVER (ORDER BY NEWID()), t.[Code]
FROM @test AS t
) AS c ON c.SK = t.ID;
返回:
ID Code
----------- ----------
1 DDDDDDDDDD
2 AAAAAAAAAA
3 BBBBBBBBBB
4 CCCCCCCCCC
(但你的结果可能不同 - 这就是重点(
快速更新:
请注意,如果您的 ID 不是连续的(例如,某些 ID 已被删除(,那么您可以考虑:
WITH T1 AS (SELECT t.ID, SK = ROW_NUMBER() OVER (ORDER BY t.ID) FROM @test AS t),
T2 AS (SELECT SK = ROW_NUMBER() OVER (ORDER BY NEWID()), t.[Code] FROM @test AS t)
SELECT t1.ID, t2.Code
FROM T1
JOIN T2 ON T1.SK = T2.SK;