我正在尝试解决与此问题相同的问题,但这次是在SQL Server 2014中。我需要检查字符串是否由相同的单词组成:
返回 true:
Antoine de Saint-Exupéry = de Saint-Exupéry Antoine = Saint-Exupéry Antoine de = etc.
和
返回 false:
Antoine de Saint-Exupéry != Antoine de Saint != Antoine Antoine de Saint-Exupéry != etc.
SQL Server 2014 中有哪些选项?是否有用于此类比较的内置函数?
为了比较 2 个字符串,可以滥用 XQuery 中的排序功能。
将字符串强制转换为 XML,对元素进行排序,然后返回不带标记的字符串。
例如:
DECLARE @Words1 NVARCHAR(MAX) = N'Antoine de Saint-Exupéry';
DECLARE @Words2 NVARCHAR(MAX) = N'Saint-Exupéry Antoine de';
DECLARE @SortedWords1 NVARCHAR(MAX) = cast('<x>'+replace(@Words1,' ','</x><x>')+'</x>' as XML).query('for $x in /x order by $x ascending return $x').value('.','nvarchar(max)');
DECLARE @SortedWords2 NVARCHAR(MAX) = cast('<x>'+replace(@Words2,' ','</x><x>')+'</x>' as XML).query('for $x in /x order by $x ascending return $x').value('.','nvarchar(max)');
DECLARE @SameWords BIT = (case
when @SortedWords1 = @SortedWords2
then 1
else 0
end);
SELECT @SameWords as SameWords;
返回:
SameWords
---------
True
这是您可以为此推出自己的一种方法。我正在使用杰夫·莫登的字符串拆分器。您可以在此处找到原始文章。http://www.sqlservercentral.com/articles/Tally+Table/72993/。如果您不喜欢该拆分器,这里还有其他一些很棒的版本。https://sqlperformance.com/2012/07/t-sql-queries/split-strings。我喜欢 Jeff Moden 的那个,因为与任何其他拆分器不同,您会返回 ItemNumber,这在某些情况下非常有用。
CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
这里的基本概念是,您必须将字符串拆分为单词,然后进行比较。我使用了几个 ctes,因此很明显这是如何工作的。以下内容适用于您发布的所有示例。
declare @Phrase1 nvarchar(100) = 'Antoine de Saint-Exupéry'
, @Phrase2 nvarchar(100) = 'de Saint-Exupéry Antoine'
;
with Phrase1 as
(
select *
from DelimitedSplit8K(@Phrase1, ' ')
)
, Phrase2 as
(
select *
from DelimitedSplit8K(@Phrase2, ' ')
)
select PhrasesEqual = convert(bit, case when count(*) > 0 then 1 else 0 end)
from Phrase1 p1
full outer join Phrase2 p2 on p2.Item = p1.Item
where p1.Item is null
or p2.Item is null
;