查找两个text/varchar列之间第一个不匹配的位置



我有两列应该包含相同的文本——有时当内容很大时,很难找到差异的实际位置。

这并不完美,但如果有一个函数可以接受两列的值并返回第一个匹配发生的位置,那将非常有帮助。由于这将被称为select中的表单,因此性能将相当重要,但它只会偶尔运行,因此不是一个大问题。

或者,一个可以在源代码管理实用程序中执行类似DIFF的功能将是理想的,但我无法想象这会有多复杂。

您可以对该解决方案进行大量改进,但部分来自以下逻辑:

declare @i as int
While @i < LEN(String1)
BEGIN
  SELECT @i AS "index"
  FROM YourTable
  WHERE SUBSTRING(String1, @i, 1) != SUBSTRING(String2, @i, 1)
  set @i = @i + 1
END

要查找两列是否不同,可以使用BINARY_CHECKSUM()。这足够敏感,甚至可以区分大写和小写。

至于查找两个字符串之间的差异,我会使用2个cte,然后使用EXCEPT。第一个结果将是字符串的第一个偏差。

declare @test1 VARCHAR(1000) = 'I have two columns that should contain identical text - sometimes when the contents are large it is very difficult to find where the discrepancy is actually located.'
declare @test2 VARCHAR(1000) = 'I have two columns that should contain identical text - sometimes when the contents are Large it is very difficult to find where the discrepancy is actually located.'
SELECT CASE WHEN BINARY_CHECKSUM(@test1) = BINARY_CHECKSUM(@test2) THEN 'Identical' ELSE 'Not Identical' END AS check_

;with cte1 AS (
SELECT 
1 AS col_
UNION ALL 
SELECT col_ + 1
FROM cte1 
WHERE col_ < (SELECT LEN(@test1))
), 
 cte2 AS (
SELECT 
1 AS col_
UNION ALL 
SELECT col_ + 1
FROM cte2 
WHERE col_ < (SELECT LEN(@test2))
), final_cte AS (
SELECT col_, SUBSTRING(@test1, col_, 1) AS char_, BINARY_CHECKSUM(SUBSTRING(@test1, col_, 1)) AS char_checksum
FROM CTE1 
EXCEPT 
SELECT col_, SUBSTRING(@test2, col_, 1) AS char_, BINARY_CHECKSUM(SUBSTRING(@test2, col_, 1)) AS char_checksum
FROM CTE2
)
select col_, char_
FROM final_cte 
ORDER by col_ 
OPTION (MAXRECURSION 1000)

尝试一下:

CREATE FUNCTION dbo.fcn_DiffPosition
(
    @str1 nvarchar(max),
    @str2 nvarchar(max)
)
RETURNS INT
AS
BEGIN
    DECLARE @MinPosition int = NULL
    ;WITH cte1 AS
    (
        SELECT      1                       AS CharacterPosition,
                    SUBSTRING(@str1, 1, 1)  AS [Character]
        UNION ALL
        SELECT      CharacterPosition + 1,
                    SUBSTRING(@str1, CharacterPosition + 1, 1)
        FROM        cte1
        WHERE       CharacterPosition < LEN(@str1)
    ),
    cte2 AS
    (
        SELECT      1                       AS CharacterPosition,
                    SUBSTRING(@str2, 1, 1)  AS [Character]
        UNION ALL
        SELECT      CharacterPosition + 1,
                    SUBSTRING(@str2, CharacterPosition + 1, 1)
        FROM        cte2
        WHERE       CharacterPosition < LEN(@str2)
    )
    SELECT      @MinPosition = MIN(ISNULL(cte1.CharacterPosition, cte2.CharacterPosition))
    FROM        cte1
    FULL JOIN   cte2 ON cte1.CharacterPosition = cte2.CharacterPosition
    WHERE       ISNULL(cte1.[Character], '') != ISNULL(cte2.[Character], '')
    OPTION      (MAXRECURSION 0)
    RETURN @MinPosition
END

如果两个字符串相同,函数将返回null。根据您的排序规则,这可能区分大小写。示例:

SELECT dbo.fcn_DiffPosition('Hello World', 'HelloWorld') -- 6
SELECT dbo.fcn_DiffPosition('Hello',       'HelloWorld') -- 6
SELECT dbo.fcn_DiffPosition('Dog', 'Dog')                -- NULL

相关内容

最新更新