字符串模糊匹配



我有一个关于在MSSQL数据库中匹配字符串的问题。基本上,我有一个包含ICD9和CPT代码的表。问题是这些代码的格式通常是不正确的(例如,字符太多,缺少小数等)。我需要能够从包含正确代码的查找表中查找这些代码中的每个代码的描述。

由于这些代码的结构方式,我可以做一些"渐进式"匹配,至少可以找到代码的类别。

假设正确的代码是:306.98

对于这个例子,让我们假设在306和307之间没有其他值。

我想去掉小数点并查找匹配,一次一个字符,直到找不到。然后选择最后匹配的字符串。

所以306、3069、3098、306981、3069812等等…匹配字符串306.98.

我希望这对每个人都有意义。我不确定我该如何开始这样做,所以任何建议都将是一个很大的帮助。

一种可能的解决方案是将代码剥离到其基本元素(306),然后执行类似的操作符:

WHERE Code LIKE '306%'

使用FLOOR函数去掉小数部分,然后在WHERE子句中使用LIKE操作符。

例如:

SELECT <COLUMN-LIST>
  FROM <TABLE-NAME>
 WHERE <THE-COLUMN> LIKE CAST(FLOOR(306.09) AS VARCHAR) + '%'

这是您的示例。您只需要将value转换为nvarchar @string。

DECLARE @string AS NVARCHAR (MAX) = '306.98';
DECLARE @Table TABLE (
    TextVal NVARCHAR (MAX));
INSERT INTO @Table ([TextVal])
SELECT '4444656'
UNION ALL
SELECT '30'
UNION ALL
SELECT '3069'
UNION ALL
SELECT '306989878787'
;
WITH   numbers
AS     (SELECT ROW_NUMBER() OVER ( ORDER BY (SELECT 1)) AS Number
        FROM   [sys].[objects] AS o1 CROSS JOIN [sys].[objects] AS o2),
       Chars
AS     (SELECT SUBSTRING(@string, [Number], 1) AS Let,
               [Number]
        FROM   [numbers]
        WHERE  [Number] <= LEN(@string)),
       Joined
AS     (SELECT [Let],
               CAST (1 AS BIGINT) AS Number
        FROM   chars
        WHERE  [Number] = 1
        UNION ALL
        SELECT [J].[Let] + CASE 
                           WHEN [Chars].[Let] = '.' THEN '' ELSE [Chars].[Let] 
                           END AS LEt,
               Chars.[Number]
        FROM   [Joined] AS J
               INNER JOIN
               [Chars]
               ON [Chars].[Number] = [J].[Number] + 1)
SELECT *
FROM   @Table AS T
WHERE  [T].[TextVal] IN (SELECT [Let]
                         FROM   [Joined])
          OR [T].[TextVal] LIKE '%'+(SELECT TOP 1 [Let] FROM
          [Joined] ORDER BY [Number] DESC )  +'%'            
                         ;

结果将是:

 TextVal
30
3069
306989878787

我能算出来。基本上,我只需要遍历字符串的每个字符并寻找匹配,直到不再找到为止。谢谢你的帮助!

/* ICD9 Lookup */
USE TSiData_Suite_LWHS_V11
DECLARE @String NVARCHAR (10)
DECLARE @Match NVARCHAR(10)
DECLARE @Substring NVARCHAR (10)
DECLARE @Description NVARCHAR(MAX) 
DECLARE @Length INT
DECLARE @Count INT
SET @String = '309.99999999'
/* Remove decimal place from string */
SET @String = REPLACE(@String,'.','')
/* Get lenth of string */
SET @Length = LEN(@String)
/* Initialize count */
SET @Count = 1
/* Get Substring */
SET @Substring = SUBSTRING(@String,1,@Count)
/* Start processing */
IF (@Length < 1 OR @String IS NULL)
    /* Validate @String */
    BEGIN
        SET @Description = 'No match found for string. String is not proper length.'
    END
ELSE IF ((SELECT COUNT(*) FROM LookupDiseases WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') < 1)
    /* Check for at least one match */
    BEGIN
        SET @Description = 'No match found for string.'
    END
ELSE
    /* Look for matching code */
    BEGIN
        WHILE ((SELECT COUNT(*) FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') <> 1 AND (@Count < @Length + 1))
        BEGIN
            /* Update substring value */
            SET @Substring = SUBSTRING(@String,1,@Count + 1)
            /* Increment @Count */
            SET @Count += 1
            /* Select the first matching code and get description */
            SELECT TOP(1) @Match =  LookupCodeDesc, @Description = LookupName FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%' ORDER BY LookupCodeDesc ASC
        END
    END
PRINT @Match
PRINT @Description

最新更新