我有相当多的CSV文件,我想导入到SQL Server数据库。这些文件中的每个都包含超过5000万行,我需要将它们导入到一个名为ControlExperimentTable
的表中。
在接下来的几个星期里,总共有500个这样的文件将被发送给我导入数据库,所以我写了一个脚本来自动化这个过程,这显然是漫长而繁琐的。
简单地说,这个过程是这样的——CSV文件被复制到一个存储库文件夹中进行处理。所有文件都有相同的前缀和唯一的后缀,例如ExportData0001.txt
,ExportData0002.txt
,ExportData0003.txt
,ExportData0004.txt
等。
脚本依次处理每个文件,将其内容导入SQL Server数据库。在导入了所有行之后,将处理过的文件移到存档文件夹中,进行完整的数据库备份,然后继续处理下一个文件。
下面是我用来完成任务的代码:
--Int variable declaration section.
DECLARE @totalFilesToProcess INT = 500 /*The number of files in the repository.*/
DECLARE @count INT = 0
--VarChar variable declaration section.
DECLARE @sqlBulkInsertCommand VARCHAR(255)
DECLARE @sqlMoveCommand VARCHAR(255)
DECLARE @sourceFilename VARCHAR(255)
DECLARE @errorFilename VARCHAR(255)
DECLARE @suffix VARCHAR(4)
--Ensure that the xp_cmdshell server configuration option is enabled.
EXEC master.dbo.sp_configure 'show advanced options', 1
RECONFIGURE
EXEC master.dbo.sp_configure 'xp_cmdshell', 1
RECONFIGURE
--Change to the target database instance.
USE [DataCollectionTable]
--Set the script to loop the number of times as there are as may files to import
WHILE (@count <= @totalFilesToProcess)
BEGIN
--Set variables.
SET @count = @count + 1
SET @suffix = RIGHT('0000' + CAST(@count AS VARCHAR(4)), 4)
SET @errorFilename = FORMATMESSAGE('E:SharedDocsControlExportData%s.csv', @suffix)
SET @sourceFilename = FORMATMESSAGE('E:SharedDocsControlExportData%s.txt', @suffix)
--COMMAND CONSTRUCT: Insert data from flat file into the ControlExperimentTable table.
SET @sqlBulkInsertCommand = FORMATMESSAGE('BULK INSERT ControlExperimentTable FROM ''%s'' WITH (FIRSTROW = 2, FIELDTERMINATOR = '','', ROWTERMINATOR = ''n'', ERRORFILE = ''%s'', TABLOCK)', @sourceFilename, @errorFilename)
--COMMAND CONSTRUCT: Move the source file to the archive folder.
SET @sqlMoveCommand = FORMATMESSAGE('MOVE E:SharedDocsControlExportData%s.txt E:SharedDocsArchiveControlExportData%s.txt', @suffix, @suffix)
--Display record count before every update.
SELECT sys.sysindexes.rows FROM sys.sysindexes INNER JOIN sys.sysobjects
ON sys.sysobjects.id=sys.sysindexes.id
WHERE sys.sysindexes.first IS NOT NULL AND sys.sysobjects.name = 'ControlExperimentTable'
--Execute the BULK INSERT command.
EXEC @sqlBulkInsertCommand
--Execute the source file MOVE command.
EXEC master.dbo.xp_cmdshell @sqlMoveCommand
--Backup database after each commitment, maintaining the current and previous copies.
IF (@count % 2) = 0
BEGIN
BACKUP DATABASE [ExperimentDataCollection] TO DISK = N'E:BackupsExperimentDataCollectionBackup-01.bak' WITH NOFORMAT, INIT, NAME = N'ExperimentDataCollection-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
END ELSE
BEGIN
BACKUP DATABASE [ExperimentDataCollection] TO DISK = N'E:BackupsExperimentDataCollectionBackup-02.bak' WITH NOFORMAT, INIT, NAME = N'ExperimentDataCollection-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
END
END
运行此代码后,我得到以下错误:
配置选项'show advanced options'从1更改为1。运行RECONFIGURE语句进行安装。
配置选项'xp_cmdshell'从1更改为1。运行RECONFIGURE语句安装Msg 911, Level 16, State 4, Line 48
数据库'BULK INSERT ControlExperimentTable FROM 'E:SharedDocsExportData0001'不存在。请确保名称输入正确。完成时间:2019-02-18 t18:06:15 .0003210+02:00
请注意,所有东西都在它们应该在的地方,并在代码中正确指定。如此之多,以至于手动运行以下代码,这正是我试图解析的工作没有麻烦。
USE [ExperimentDataCollection]
BULK INSERT [ExperimentDataCollection].[dbo].[ControlExperimentTable]
FROM 'E:SharedDocsControlExportData0001.txt'
WITH (FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = 'n',
ERRORFILE = 'E:SharedDocsControlExportData0001.csv',
TABLOCK)
我哪里做错了?
在执行动态sql时,需要将变量用圆括号括起来。所以修改这个:
EXEC @sqlBulkInsertCommand
:
EXEC(@sqlBulkInsertCommand)