T-SQL SQL Server 2014 中的 HTML 转义



我在SQL Server数据库中有一列,它以下列方式存储文本块:

<HTML><HEAD><style type="text/css">BODY,TD,TH,BUTTON,INPUT,SELECT,TEXTAREA{FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: Arial,Helvetica;}BODY{MARGIN: 5px;}P,DIV,UL,OL,BLOCKQUOTE{MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px;}</style></HEAD><BODY> <p style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px">Patient is a&nbsp;84 year old female.&nbsp; Patient's histpry includes the following:</p> <p style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px">&nbsp;</p></BODY></HTML>​

我想从上面的这个特定示例中带回的只是:

Patient is an 84 year old female. Patient's histpry includes the following:

老实说,我什至不知道从哪里开始,SQL Server 2014中是否有任何HTML转义类型函数?我无权访问 CLI,我需要在我负责创建的存储过程中运行代码。

如果对表值函数开放,请考虑以下事项。

厌倦了提取字符串(左,右,字符索引,patindex,反向等(,我修改了一个split/parse函数以接受两个非类似的分隔符。 在这种情况下,></

此外,作为 TVF,如果您的数据在表中,则很容易合并到交叉应用中。

Declare @S varchar(max)='<HTML><HEAD><style type="text/css">BODY,TD,TH,BUTTON,INPUT,SELECT,TEXTAREA{FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: Arial,Helvetica;}BODY{MARGIN: 5px;}P,DIV,UL,OL,BLOCKQUOTE{MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px;}</style></HEAD><BODY> <p style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px">Patient is a&nbsp;84 year old female.&nbsp; Patient''s histpry includes the following:</p> <p style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px">&nbsp;</p></BODY></HTML>​'
Select *
From  [dbo].[tvf-Str-Extract](replace(@S,'&nbsp;',' '),'>','</')
Where RetVal<>' '
and RetVal not like 'BODY,%'

返回

RetSeq  RetPos  RetVal
2       284     Patient is a 84 year old female.  Patient's histpry includes the following:

注意:WHERE 是可选的,可能需要进行调整以满足您的实际需求。 只是为了好玩,在没有 WHERE 的情况下尝试一下。 此外,在此示例中,我们捕获了&nbsp;,但如您所知,可能还有许多其他人,即&mdash;.

函数(如果感兴趣(

CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table 
As
Return (  
with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By N)
,RetPos = N
,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1) 
From  (
Select *,RetVal = Substring(@String, N, L) 
From  cte4
) A
Where charindex(@Delimiter2,RetVal)>1
)
/*
Max Length of String 1MM characters
Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (@String,'[[',']]')
*/

对于 HTML,您永远无法确定转换为 XML 是否会成功。但是,用简单的空白替换&nbsp;后,您可能会这样做:

Declare @S varchar(max)='<HTML><HEAD><style type="text/css">BODY,TD,TH,BUTTON,INPUT,SELECT,TEXTAREA{FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: Arial,Helvetica;}BODY{MARGIN: 5px;}P,DIV,UL,OL,BLOCKQUOTE{MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px;}</style></HEAD><BODY> <p style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px">Patient is a&nbsp;84 year old female.&nbsp; Patient''s histpry includes the following:</p> <p style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px">&nbsp;</p></BODY></HTML>​'
SELECT CAST(REPLACE(@S,'&nbsp;',' ') AS XML).value('(//p/text())[1]','nvarchar(max)');

结果

Patient is a 84 year old female.  Patient's histpry includes the following:

最新更新