在TSQL中分割字符串文本



我有下面的字符串需要拆分,以允许插入到表中。

数据来自:

http://smartgriddashboard.eirgrid.com/DashboardService.svc/csv?area=co2Intensity&地区= NI& datefrom = 30 - 2021年8月——% 2000:00& dateto = 30 - 2021年8月——% 2023:59

我正在开发一个函数来运行代理作业读取数据,然后将其解析为MSSQL数据库。例如

DECLARE @fromTime NVARCHAR(MAX)
DECLARE @toTime NVARCHAR(MAX)
DECLARE @URL2 NVARCHAR(MAX)
Set @fromTime = REPLACE(FORMAT(DATEADD(DAY, -1, GETDATE()), 'dd-MMM-yyyy 00:00' ), ' ','%20')
Set @toTime = REPLACE(FORMAT(DATEADD(DAY, -1,GETDATE()), 'dd-MMM-yyyy 23:59' ), ' ','%20')
select @URL2 = 
concat('http://smartgriddashboard.eirgrid.com/DashboardService.svc/csv?area=co2Intensity&region=ALL&datefrom=',@fromTime,'&dateto=',@toTime)
DECLARE @URL NVARCHAR(MAX) = @URL2
Declare @Object as Int;
Declare @ResponseText as Varchar(8000);
Declare @currenttime as datetime;
set @currenttime = GETDATE()
Exec sp_OACreate 'MSXML2.XMLHTTP', @Object OUT;
Exec sp_OAMethod @Object, 'open', NULL, 'get',
@URL,
'False'
Exec sp_OAMethod @Object, 'send'
Exec sp_OAMethod @Object, 'responseText', @ResponseText OUTPUT
IF((Select @ResponseText) <> '')
BEGIN
DECLARE @json NVARCHAR(MAX) = (Select @ResponseText)

--do something here to parce the return data
;with CTE(raw) as 
(
select replace(replace(replace(replace(@json,' ','-'),' ','-'),' ','-'),' ','-')
)
select * from CTE

select @json
--write to SQL table.
END
ELSE
BEGIN
DECLARE @ErroMsg NVARCHAR(30) = 'No data found.';
Print @ErroMsg;
END
Exec sp_OADestroy @Object

表中每行的分隔符为双空格。每列的分隔符为","。

第一个双空格分隔符之前的第一组数据是列/表标题,并且是固定的。例如

DATE & TIME  |  CO2 INTENSITY (gCO2/kWh)  |  REGION

上面URL的示例输出

DATE & TIME, CO2 INTENSITY (gCO2/kWh), REGION  28 August 2021 00:00,470,All Island  28 August 2021 00:15,471,All Island  28 August 2021 00:30,471,All Island  28 August 2021 00:45,472,All Island  28 August 2021 01:00,470,All Island  28 August 2021 01:15,471,All Island  28 August 2021 01:30,468,All Island  28 August 2021 01:45,471,All Island  28 August 2021 02:00,468,All Island  28 August 2021 02:15,460,All Island  28 August 2021 02:30,452,All Island  28 August 2021 02:45,450,All Island  28 August 2021 03:00,464,All Island  28 August 2021 03:15,469,All Island  28 August 2021 03:30,472,All Island  28 August 2021 03:45,476,All Island  28 August 2021 04:00,472,All Island  28 August 2021 04:15,477,All Island  28 August 2021 04:30,488,All Island  28 August 2021 04:45,485,All Island  28 August 2021 05:00,488,All Island  28 August 2021 05:15,489,All Island  28 August 2021 05:30,487,All Island  28 August 2021 05:45,493,All Island  28 August 2021 06:00,492,All Island  28 August 2021 06:15,472,All Island  28 August 2021 06:30,469,All Island  28 August 2021 06:45,468,All Island  28 August 2021 07:00,472,All Island  28 August 2021 07:15,476,All Island  28 August 2021 07:30,482,All Island  28 August 2021 07:45,473,All Island  28 August 2021 08:00,465,All Island  28 August 2021 08:15,463,All Island  28 August 2021 08:30,467,All Island  28 August 2021 08:45,470,All Island  28 August 2021 09:00,460,All Island  28 August 2021 09:15,461,All Island  28 August 2021 09:30,463,All Island  28 August 2021 09:45,472,All Island  28 August 2021 10:00,478,All Island  28 August 2021 10:15,475,All Island  28 August 2021 10:30,475,All Island  28 August 2021 10:45,473,All Island  28 August 2021 11:00,459,All Island  28 August 2021 11:15,462,All Island  28 August 2021 11:30,445,All Island  28 August 2021 11:45,438,All Island  28 August 2021 12:00,433,All Island  28 August 2021 12:15,435,All Island  28 August 2021 12:30,429,All Island  28 August 2021 12:45,424,All Island  28 August 2021 13:00,430,All Island  28 August 2021 13:15,430,All Island  28 August 2021 13:30,431,All Island  28 August 2021 13:45,429,All Island  28 August 2021 14:00,419,All Island  28 August 2021 14:15,418,All Island  28 August 2021 14:30,416,All Island  28 August 2021 14:45,421,All Island  28 August 2021 15:00,411,All Island  28 August 2021 15:15,424,All Island  28 August 2021 15:30,430,All Island  28 August 2021 15:45,429,All Island  28 August 2021 16:00,424,All Island  28 August 2021 16:15,423,All Island  28 August 2021 16:30,416,All Island  28 August 2021 16:45,422,All Island  28 August 2021 17:00,425,All Island  28 August 2021 17:15,439,All Island  28 August 2021 17:30,439,All Island  28 August 2021 17:45,440,All Island  28 August 2021 18:00,440,All Island  28 August 2021 18:15,426,All Island  28 August 2021 18:30,439,All Island  28 August 2021 18:45,429,All Island  28 August 2021 19:00,437,All Island  28 August 2021 19:15,440,All Island  28 August 2021 19:30,436,All Island  28 August 2021 19:45,441,All Island  28 August 2021 20:00,434,All Island  28 August 2021 20:15,446,All Island  28 August 2021 20:30,434,All Island  28 August 2021 20:45,447,All Island  28 August 2021 21:00,443,All Island  28 August 2021 21:15,456,All Island  28 August 2021 21:30,447,All Island  28 August 2021 21:45,448,All Island  28 August 2021 22:00,447,All Island  28 August 2021 22:15,426,All Island  28 August 2021 22:30,418,All Island  28 August 2021 22:45,433,All Island  28 August 2021 23:00,438,All Island  28 August 2021 23:15,440,All Island  28 August 2021 23:30,434,All Island  28 August 2021 23:45,434,All Island

前3行和3列的示例如下:

28 August 2021 00:00  |  470  |  All Island
28 August 2021 00:15  |  471  |  All Island
28 August 2021 00:30  |  471  |  All Island

所以我想要实现的输出是这样的表

DATE & TIME     |  CO2 INTENSITY (gCO2/kWh)  |  REGION
-----------------------------------------------------------------
28 August 2021 00:00  |          470               |  All Island
28 August 2021 00:15  |          471               |  All Island
28 August 2021 00:30  |          471               |  All Island

我已经尝试了分割字符串和批量导入的真实性,并没有管理它。例如

--;with CTE(raw) as 
--(
----select @Data as raw
--select replace(replace(replace(replace(@Data ,' ','-'),' ','-'),' 
','- 
'),' ','-')
--   )
--select * from CTE
-- write to file first and then import e.g.
--BULK INSERT [AutoData].[dbo].[Eirgrid_Co2Data]
--FROM 'C:TempDataFileName.csv'
--WITH ( FORMAT='CSV');
-- Tried to read it as JSON and XML!
--select * from OPENJSON(@Data)
select @Data
-- splitting!
--  SELECT value  
--FROM STRING_SPLIT(@Data, ',')
----FROM STRING_SPLIT(@Data, '  ')  'split of double space!
--WHERE RTRIM(value) <> '';

看起来行分隔符是2个空格。不幸的是,SQL Server不容易分割2个空间。它还出现了一些离群数据,即ItemNumber=7,其中有一个实际的回车。为了蛮力完成,这里用波浪'~'替换2个空格(以及1个空格和1个回车符),然后使用顺序分隔符在波浪上进行分割。

这里描述的顺序分配器

CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) 
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
;

查询

declare @string         nvarchar(max)=N'DATE & TIME, CO2 INTENSITY (gCO2/kWh), REGION  27 August 2021 00:00,465,All Island  27 August 2021 00:15,459,All Island  27 August
2021 00:30,463,All Island  27 August 2021 00:45,464,All Island  27
August 2021 01:00,461,All Island  27 August 2021 01:15,465,All Island 
27 August 2021 01:30,466,All Island  27 August 2021 01:45,470,All
Island  27 August 2021 02:00,466,All Island';

select *
from dbo.DelimitedSPlit8K_LEAD(replace(replace(@string, ' 
','~'), '  ','~'), '~') dl;
ItemNumber  Item
1           DATE & TIME, CO2 INTENSITY (gCO2/kWh), REGION
2           27 August 2021 00:00,465,All Island
3           27 August 2021 00:15,459,All Island
4           27 August  2021 00:30,463,All Island
5           27 August 2021 00:45,464,All Island
6           27  August 2021 01:00,461,All Island
7           27 August 2021 01:15,465,All Island
8           27 August 2021 01:30,466,All Island
9           27 August 2021 01:45,470,All  Island
10          27 August 2021 02:00,466,All Island

我使用多年的方法是在这里找到的计数表解决方案。

最后是CTE varchar版本的函数。如果需要,上面的链接也有nvarchar版本。

需要注意的是,您的数据似乎没有一致的分隔符,因此您可能需要多次使用以下分隔符。

作为示例,使用上面数据的第一行:

DECLARE @Rowdata Varchar(100) = '27 August 2021 02:25,465,All Island' ;
SELECT * FROM [dbo].[DelimitedSplit8K](@Rowdata,',')

产生以下结果集:

ItemNumber)项目

12012.8.27 02:25

2465

3All Island

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item       = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;

相关内容

  • 没有找到相关文章

最新更新