解析字符串对于 charindex 来说太长



在 TSQL (SSMS 2016) 中,我尝试使用 WHILE 循环、临时表和 CHARINDEX 解析长字符串中的重复数据。每次循环运行时,它都会使用上一个停止点作为下一个起点。循环有效,但 CHARINDEX 似乎有 8000 个字符的限制,字符串比这长得多。有没有另一种方法可以从长度超过 8000 个字符的字符串中解析数据?

编辑- 我正在尝试从长字符串(超过 100,000 个字符)中提取由属性标签指示的名称。数据看起来像这样,但连接成一个长字符串:

<alarm-response-list xmlns="http://www.thePlace.com" total-alarms="862" throttle="862" error="EndOfResults">
<alarm-responses>
<alarm id="5afeeaac-355f-11a0-02bd-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Narnia</attribute>
</alarm>
<alarm id="5b5724cb-e0be-1016-0275-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Mordor</attribute>
</alarm>
<alarm id="5b4af6e5-8f8d-103e-023d-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Atlantis</attribute>
</alarm>

在此示例中,我想要属性 ID 为"0x1006e"的任何内容。

编辑- 请参阅下面的示例代码。只要 WHILE 语句受到小于 8000 的数字的限制,代码就可以正常运行。之后,限制为 8000 个字符的 CHARINDEX 开始生效。

DECLARE @temp TABLE(modelName VARCHAR(300))
DECLARE @ctr INT = (SELECT MIN(ID) FROM [dbo].[Alarms])
DECLARE @start INT = (SELECT CHARINDEX('1006E',Results)+7 FRP FROM [dbo]. 
[Alarms] WHERE ID = @ctr) 
DECLARE @len INT = (SELECT  
CHARINDEX('</attr',Results,CHARINDEX('1006E',Results)) - 
CHARINDEX('1006E',Results) - 7 FROM [dbo].[Alarms] WHERE ID = @ctr)
DECLARE @totalLen INT = (SELECT LEN(CAST(results AS VARCHAR(MAX))) FROM 
dbo.Alarms WHERE ID = @ctr)
WHILE @start < 5000 BEGIN
INSERT @temp 
SELECT SUBSTRING(Results,@start,@len) Name 
FROM [dbo].[Alarms]
WHERE ID = @ctr
SET @start = (SELECT CHARINDEX('1006E',Results,@start + 1)+7 FRP FROM [dbo]. 
[Alarms] WHERE ID = @ctr) 
SET @len = (SELECT  
CHARINDEX('</attr',Results,CHARINDEX('1006E',Results,@start+1)) -
CHARINDEX('1006E',Results,@start + 1) - 7 FROM [dbo].[Alarms] WHERE ID 
= @ctr)
END
Select * from @temp

我将其添加为新答案。

如评论中所述:
请避免变色龙问题...在您编辑后,这完全是另一回事,并使现有答案无效......对于未来:如果您发现需要更改问题,最好通过接受最佳答案(回答初始问题的答案)来关闭现有问题。然后开始一个新问题。

使用本机 XML 的方法解决您的问题

James,你的字符串没什么荒谬的,没有什么你必须解析的,而只是XML。有现有的工具可以阅读此内容。几乎任何编程语言都将提供对XPathXQuery的支持。这不是你自己应该做的...

尝试一下,并返回任何问题(但在一个新问题中)

DECLARE @xml XML=
N'<alarm-response-list xmlns="http://www.thePlace.com" total-alarms="862" throttle="862" error="EndOfResults">
<alarm-responses>
<alarm id="5afeeaac-355f-11a0-02bd-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Narnia</attribute>
</alarm>
<alarm id="5b5724cb-e0be-1016-0275-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Mordor</attribute>
</alarm>
<alarm id="5b4af6e5-8f8d-103e-023d-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Atlantis</attribute>
</alarm>
<!-- have to append closing nodes -->
</alarm-responses>
</alarm-response-list>';
WITH XMLNAMESPACES(DEFAULT 'http://www.thePlace.com')
SELECT @xml.value('(/alarm-response-list/@total-alarms)[1]','int') AS TotalAlarms
,@xml.value('(/alarm-response-list/@throttle)[1]','int') AS throttle
,@xml.value('(/alarm-response-list/@error)[1]','nvarchar(max)') AS error
,alarm.value('@id','uniqueidentifier') AS Alarm_id
,attr.value('@id','nvarchar(max)') AS Alarm_Attribute_id
,attr.value('text()[1]','nvarchar(max)') AS Alarm_Attribute_content
FROM @xml.nodes('/alarm-response-list/alarm-responses/alarm') A(alarm)
OUTER APPLY alarm.nodes('attribute') B(attr);

结果

+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| TotalAlarms | throttle | error        | Alarm_id                             | Alarm_Attribute_id | Alarm_Attribute_content |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5AFEEAAC-355F-11A0-02BD-0080101C40B8 | 0x12d7f            | ##.##.###.###           |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5AFEEAAC-355F-11A0-02BD-0080101C40B8 | 0x1006e            | Narnia                  |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B5724CB-E0BE-1016-0275-0080101C40B8 | 0x12d7f            | ##.##.###.###           |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B5724CB-E0BE-1016-0275-0080101C40B8 | 0x1006e            | Mordor                  |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B4AF6E5-8F8D-103E-023D-0080101C40B8 | 0x12d7f            | ##.##.###.###           |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B4AF6E5-8F8D-103E-023D-0080101C40B8 | 0x1006e            | Atlantis                |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+

使用谓词获取所需的答案

这将使用.nodes()中的谓词来检索所有<attribute>元素的派生表,其中@id具有给定值。

WITH XMLNAMESPACES(DEFAULT 'http://www.thePlace.com')
SELECT a.value('text()[1]','nvarchar(max)') AS Alarm_Attribute_content
FROM @xml.nodes('//attribute[@id="0x1006e"]') A(a)

结果

Alarm_Attribute_content
------
Narnia
Mordor
Atlantis

@James Pratt,我做了一个模拟,给你另一个使用循环的逻辑。

这个使用 CTE 来构建 2000 行(并且有扩展的技术)。

DECLARE @Alarms table (Id int, Results varchar(max))
INSERT into @Alarms 
SELECT 1, '[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]...'

;WITH x AS 
(
SELECT  TOP (2000) n = ROW_NUMBER() OVER (ORDER BY Number)
FROM master.dbo.spt_values ORDER BY Number
)
SELECT 
RIGHT(t.Results, x.n) 
FROM x
JOIN @Alarms AS t ON 
x.n < LEN(t.Results) 

如果对 TVF 开放。

厌倦了提取字符串(左,右,字符索引,patindex等),我修改了一个解析/拆分函数以接受两个非类似的delimeter。

Declare @S varchar(max) = '[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]'
Select Seq   = A.RetSeq
,Item  = B.RetVal
,Value = A.RetVal
From  [dbo].[tvf-str-extract](@S,']','[') A
Join ( Select Seq=Row_Number() over (Order by RetSeq),* 
From [dbo].[tvf-str-extract](@S,'[',']') 
Where charindex('/',RetVal)=0
) B on B.Seq=A.RetSeq
Order By A.RetSeq

返回

Seq Item    Value
1   A       dkdk
2   B       123
3   N       Fred
4   A       ddj
5   B       456
6   N       Bill
7   A       akdl

函数(如果感兴趣)

CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table 
As
Return (  
with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By N)
,RetPos = N
,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1) 
From  (
Select *,RetVal = Substring(@String, N, L) 
From  cte4
) A
Where charindex(@Delimiter2,RetVal)>1
)
/*
Max Length of String 1MM characters

只是为了帮助视觉对象,别名A或作为独立

Select * From  [dbo].[tvf-str-extract](@S,']','[') A

返回

RetSeq  RetPos  RetVal
1       4       dkdk
2       15      123
3       25      Fred
4       36      ddj
5       46      456
6       56      Bill
7       67      akdl

这看起来像是一种奇怪的XML...可能不是对所有字符串都有效,但给定的示例可以很容易地传输到 XML:

DECLARE @tbl table (Id INT IDENTITY, YourString VARCHAR(MAX))
INSERT INTO @tbl VALUES
('[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]...');
SELECT CAST(REPLACE(REPLACE(t.YourString,'[','<'),']','>') AS XML)
FROM @tbl t

结果

<A>dkdk</A>
<B>123</B>
<N>Fred</N>
<A>ddj</A>
<B>456</B>
<N>Bill</N>
<A>akdl</A>...

逐部分阅读非常容易。

但是 - 老实说 - 在很多很多情况下,这种方法会中断。

只要字符串VARCHAR(MAX),我认为无论如何它都应该有效。无论如何,这是我使用 TVF 的看法:

CREATE FUNCTION Tools.FindElements (@String VARCHAR(MAX)
, @Open   VARCHAR(8000)
, @Close  VARCHAR(8000))
/*
This function splits a VARCHAR string into a table of elements/items by finding opening and closing tags
The table returns ID (by the order of occurance) and String (the element)
This is based off Jeff Moden's DelimitedSplit8K function: http://www.sqlservercentral.com/articles/Tally+Table/72993/
It has been modified to:
1) Accept delimiters longer than 1 character
2) Use two delimiters (opening and closing tags)
*/   
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
/* 10 rows (all 1s) */
WITH CTE_10
AS (SELECT Number
FROM(VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1) ) v(Number)),
-------------------
/* 100 rows (all 1s) */
CTE_100
AS (SELECT Number = 1
FROM CTE_10 a
CROSS JOIN CTE_10 b),
-------------------
/* 10000 rows max (all 1s) */
CTE_10000
AS (SELECT Number = 1
FROM CTE_100 a
CROSS JOIN CTE_100 b),
-------------------
/* 100000000 rows max (all 1s) - this limits the number of elements to 100 million (which I hope is enough)) */
CTE_100000000
AS (SELECT Number = 1
FROM CTE_10000 a
CROSS JOIN CTE_10000 b),
-------------------
/* Numbers "Table" CTE: 
1) TOP has variable parameter = DATALENGTH(@String)
2) Use ROW_NUMBER */
CTE_Numbers
AS (SELECT TOP (ISNULL(DATALENGTH(@String), 0)) Number = ROW_NUMBER() OVER(ORDER BY (SELECT NULL) )
FROM CTE_100000000),
-------------------
/* Returns start of the element after each delimiter */
CTE_Start
AS (SELECT [Start] = Number + DATALENGTH(@Open)
FROM CTE_Numbers
WHERE SUBSTRING(@String, Number, DATALENGTH(@Open)) = @Open),
-------------------
/* IF @Delimiter <> '': Returns start and length (for use in substring) */
CTE_Length
AS (SELECT [Start]
, [Length] = ISNULL(NULLIF(CHARINDEX(@Close, @String, [Start]), 0) - [Start], 8000) -- ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
FROM CTE_Start)

/* Do the actual split */
SELECT ID = ROW_NUMBER() OVER(ORDER BY [Start])
, String = SUBSTRING(@String, [Start], [Length])
FROM CTE_Length;

然后像这样使用它:

SELECT *
FROM Tools.FindElements ('[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]', '[N]', '[/N]');

或者像这样:

SELECT *
FROM dbo.TextTable
CROSS APPLY Tools.FindElements (TextColumn, '[N]', '[/N]');

最新更新