我在SQL Server 2019中的一个表中有四列(Col1、Col2、Col3、Col4(,其中包含数百万条记录。
在存储过程中,我必须传递四个输入参数@Col1、@Col2、@Col3、@Col4,并且无论列顺序如何,无论是否找到所有四个值,都应该返回成功/失败。例如,@Col1可以与Col2匹配。
Col2、Col3、Col4中的某些值可能为空,但Col1中始终有一些数据。
我已经准备了一些我测试过的样本数据和场景。
CREATE TABLE SampleData(Id INT IDENTITY(1,1), Col1 VARCHAR(20), Col2 VARCHAR(20), Col3 VARCHAR(20), Col4 VARCHAR(20))
INSERT INTO SampleData(Col1, Col2, Col3, Col4)
SELECT 'ABC','DEF','GHI','JKL' UNION
SELECT '123','456','789','100' UNION
SELECT 'ABC','XYZ','','' UNION
SELECT '9898','6565',NULL,NULL UNION
SELECT '989844','D656555','','' UNION
SELECT '8888','9999','7777','6666' UNION
SELECT '1234','5678','4321',NULL UNION
SELECT '465456465',NULL,NULL,NULL
存储过程
CREATE PROC dbo.ValidateSampleData(
@Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL
)
AS
BEGIN
Declare @a as bit = 0, @Message VARCHAR(50) = 'Data Not Matched'
if(@Col1 is NULL or @Col2 is NULL or @Col3 is NULL or @Col4 is NULL )
begin
set @a = 1
end
SELECT @Message = 'Data Matched '
FROM SampleData SD
where (Col1 in (@Col1,@Col2,@Col3,@Col4) or (Col1 is null and @a = 1))
and (Col2 in (@Col1,@Col2,@Col3,@Col4) or (Col2 is null and @a = 1))
and (Col3 in (@Col1,@Col2,@Col3,@Col4) or (Col3 is null and @a = 1))
and (Col4 in (@Col1,@Col2,@Col3,@Col4) or (Col4 is null and @a = 1))
and (select sum( (case when Col1 is null then 1 else 0 end)
+ (case when Col2 is null then 1 else 0 end)
+ (case when Col3 is null then 1 else 0 end)
+ (case when Col4 is null then 1 else 0 end)
) from SampleData where Id = SD.Id) =
(select sum( (case when @Col1 is null then 1 else 0 end)
+ (case when @Col2 is null then 1 else 0 end)
+ (case when @Col3 is null then 1 else 0 end)
+ (case when @Col4 is null then 1 else 0 end)))
SELECT @Message
END
这是我尝试的一些数据样本集
DECLARE
@Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL
--Case 0 - Should not matched
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = 'testtest' , @Col4 = ''
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 1
SELECT @Col1 = 'ABC', @Col2 = 'DEF' , @Col3 = 'GHI' , @Col4 = 'JKL'
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 2
SELECT @Col1 = 'DEF', @Col2 = 'JKL' , @Col3 = 'ABC' , @Col4 = 'GHI'
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 3
SELECT @Col1 = '123', @Col2 = '456' , @Col3 = '789' , @Col4 = '100'
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 4
SELECT @Col1 = '1234', @Col2 = '5678' , @Col3 = '4321' , @Col4 = NULL
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 5
SELECT @Col1 = '465456465', @Col2 = NULL , @Col3 = NULL , @Col4 = NULL
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 6
SELECT @Col1 = '8888', @Col2 = '9999' , @Col3 = '7777' , @Col4 = '6666'
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 7
SELECT @Col1 = '9898', @Col2 = '6565' , @Col3 = NULL , @Col4 = NULL
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 8
SELECT @Col1 = '989844', @Col2 = 'D656555' , @Col3 = '' , @Col4 = ''
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 9
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = '' , @Col4 = ''
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
--Case 10 - Should not matched
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = 'tet' , @Col4 = ''
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4
在某些情况下,我已经达到了预期的结果,但表中的值为NULL是不正确的情况0不应匹配
此外,它看起来并没有优化,而且在大数据集上运行速度非常慢。
简单地说,我想检查表中是否出现了所有4个值,NULL是否与NULL匹配?如果是,则返回匹配项,如果不是,则返回不匹配项?
我的方法是首先创建一个表,其中包含4个值的所有可能组合。这可以通过创建一个具有4行的派生表来完成,然后将其与自身连接4次,每次连接都确保您没有拾取已经拾取的行,即
DECLARE @Values TABLE
(
Col1 VARCHAR(20),
Col2 VARCHAR(20),
Col3 VARCHAR(20),
Col4 VARCHAR(20)
UNIQUE (Col1, Col2, Col3, Col4)
);
WITH Data AS
( SELECT Value, v.Ordinal
FROM (VALUES (1, @Col1), (2, @Col2), (3, @Col3), (4, @Col4)) AS v (Ordinal, Value)
)
INSERT @Values(Col1, Col2, Col3, Col4)
SELECT DISTINCT d1.Value, d2.Value, d3.Value, d4.Value
FROM Data AS d1
INNER JOIN Data AS d2
ON d2.Ordinal NOT IN (d1.Ordinal)
INNER JOIN Data AS d3
ON d3.Ordinal NOT IN (d1.Ordinal, d2.Ordinal)
INNER JOIN Data AS d4
ON d4.Ordinal NOT IN (d1.Ordinal, D2.Ordinal, d3.Ordinal);
有一个非常简单的例子,其中@Col1
="a",所有其他都是NULL
,你最终会得到这样的东西:
Col1 | Col2 | Col3 | Col4 |
---|---|---|---|
NULL | NULL | A | |
NULL | NULL | A | NULL|
空 | A | 空||
A | NULL | NULL | NULL