我在SQL Server 2014(Developer Edition,x64,Windows 10 Pro x64)中有一个包含50亿行的表:
CREATE TABLE TestTable
(
ID BIGINT IDENTITY(1,1),
PARENT_ID BIGINT NOT NULL,
CONSTRAINT PK_TestTable PRIMARY KEY CLUSTERED (ID)
);
CREATE NONCLUSTERED INDEX IX_TestTable_ParentId
ON TestTable (PARENT_ID);
我正在尝试应用以下补丁:
-- Create non-nullable column with default (should be online operation in Enterprise/Developer edition)
ALTER TABLE TestTable
ADD ORDINAL TINYINT NOT NULL CONSTRAINT DF_TestTable_Ordinal DEFAULT 0;
GO
-- Populate column value for existing data
BEGIN
SET NOCOUNT ON;
DECLARE @BATCH_SIZE BIGINT = 1000000;
DECLARE @COUNTER BIGINT = 0;
DECLARE @ROW_ID BIGINT;
DECLARE @ORDINAL BIGINT;
DECLARE ROWS_C CURSOR
LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY
FOR
SELECT
ID AS ID,
ROW_NUMBER() OVER (PARTITION BY PARENT_ID ORDER BY ID ASC) AS ORDINAL
FROM
TestTable;
OPEN ROWS_C;
FETCH NEXT FROM ROWS_C
INTO @ROW_ID, @ORDINAL;
BEGIN TRANSACTION;
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE TestTable
SET
ORDINAL = CAST(@ORDINAL AS TINYINT)
WHERE
ID = @ROW_ID;
FETCH NEXT FROM ROWS_C
INTO @ROW_ID, @ORDINAL;
SET @COUNTER = @COUNTER + 1;
IF @COUNTER = @BATCH_SIZE
BEGIN
COMMIT TRANSACTION;
SET @COUNTER = 0;
BEGIN TRANSACTION;
END;
END;
COMMIT TRANSACTION;
CLOSE ROWS_C;
DEALLOCATE ROWS_C;
SET NOCOUNT OFF;
END;
GO
-- Drop default constraint from the column
ALTER TABLE TestTable
DROP CONSTRAINT DF_TestTable_Ordinal;
GO
-- Drop IX_TestTable_ParentId index
DROP INDEX IX_TestTable_ParentId
ON TestTable;
GO
-- Create IX_TestTable_ParentId_Ordinal index
CREATE UNIQUE INDEX IX_TestTable_ParentId_Ordinal
ON TestTable (PARENT_ID, ORDINAL);
GO
补丁的目的是添加一个名为ORDINAL的列,它是同一父级(由parent_ID定义)内记录的序号。该修补程序使用SQLCMD运行。
补丁是这样做的,原因有很多:
- 表太大,无法在其上运行单个UPDATE语句(在事务日志/tempdb中占用大量时间和空间)
- 使用具有TOP n行的单个UPDATE语句进行批量更新并不容易实现(如果我们在例如1m行的批量中更新表,则1000001th行可能与1000000th属于相同的PARENT_ID,这将导致错误的序号分配给1000001th记录)。换句话说,在游标中运行的SELECT语句应该运行一次(不分页),或者应该应用更复杂的操作(联接/条件)
- 添加NULL列并稍后将其更改为NOT NULL不是一个好的解决方案,因为我使用SNAPSHOT隔离(将在将列更改为NOT NULL时执行完整的表更新)
该补丁在一个有数百万行的小型数据库上运行得很好,但当应用于一个有数十亿行的数据库时,我得到:
消息3606,级别16,状态2,服务器XXX,第22行
发生算术溢出。
我的第一个猜测是ORDINAL值太大,无法放入TINYINT列,但事实并非如此。我创建了一个具有类似结构的测试数据库,并填充了数据(每个父级超过255行)。我收到的错误消息仍然是算术异常,但有不同的消息代码和不同的措辞(明确表示无法将数据放入TINYINT)。
目前我有一些怀疑,但我还没有找到任何能帮助我的东西:
- CURSOR无法处理超过MAX(INT32)行的数据
- SQLCMD施加了限制
你对可能出现的问题有什么想法吗?
使用While循环但确保将相同的parent_id放在一起怎么样:
DECLARE @SegmentSize BIGINT = 1000000
DECLARE @CurrentSegment BigInt = 0
WHILE 1 = 1
BEGIN
;With UpdateData As
(
SELECT ID AS ID,
ROW_NUMBER() OVER (PARTITION BY PARENT_ID ORDER BY ID ASC) AS ORDINAL
FROM TestData
WHERE ID > @CurrentSegment AND ID <= (@CurrentSegment + @SegmentSize)
)
UPDATE TestData
SET Ordinal = UpdateDate.Ordinal
FROM TestData
INNER JOIN UpdateData ON TestData.Id = UpdateData.Id
IF @@ROWCOUNT = 0
BEGIN
BREAK
END
SET @CurrentSegment = @CuurentSegment + @SegmentSize
END
编辑-根据要求修改为Parent_Id上的分段。这应该是Parent_id被索引时相当快(添加了Option(Recomple)以确保实际值用于查找。因为您没有更新这将限制事务日志的增长!
DECLARE @SegmentSize BIGINT = 1000000
DECLARE @CurrentSegment BigInt = 0
WHILE 1 = 1
BEGIN
;With UpdateData As
(
SELECT ID AS ID,
ROW_NUMBER() OVER (PARTITION BY PARENT_ID ORDER BY ID ASC) AS ORDINAL
FROM TestData
WHERE Parent_ID > @CurrentSegment AND
Parent_ID <= (@CurrentSegment + @SegmentSize)
)
UPDATE TestData
SET Ordinal = UpdateDate.Ordinal
FROM TestData
INNER JOIN UpdateData ON TestData.Id = UpdateData.Id
OPTION (RECOMPILE)
IF @@ROWCOUNT = 0
BEGIN
BREAK
END
SET @CurrentSegment = @CuurentSegment + @SegmentSize
END