计算每一行加上其他条件的Null值



在SQL Server中工作。事实上,我有一个多列(20+(、数千行和多种数据类型的数据。我没有创建函数或过程的权限,但我可以处理临时表。以下是数据的简化片段。

drop table if exists #test
create table #test
(
Id INT,
BusinessName varchar(30),
Address1 varchar(100),
Address2 varchar (100),
Address3 varchar (100),
Postcode varchar (100),
City varchar (100),
Country varchar (20),
Turnover dec(20,2),
domain varchar (100)
)
insert #test values
(1,'A','nr1','street1', 'Court1', null, null, 'GB', 1.1,'www@1'),
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(1,'A', null, null, 'Court1', null, 'Paris', 'FR', 1.3, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2'),
(2,'B','nr2a',null, null, 'M2 3DW', 'Manchester', 'GB', 2, 'www@2')

对于每个ID,我需要选择具有最少null数量的记录。

如果最少的null数量相同,那么我需要选择Postcode不为null的记录,

如果仍然有多个选择,我需要选择一个城市不为空,

如果仍然有多个选项,我需要选择一个在所有三个功能(地址1、地址2、地址3(中空值最少的选项

否则我可以选择top1。

答案应该是:

(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2')

以下是我对每一行的null进行计数的尝试,但首先,地址null没有正确相加。此外,我不知道现在如何根据优先级进行选择:

Drop table if exists #solution 
select s.*
,case
when Address1 Is NULL   
OR Address1 = ''
THEN 1
End Add1filled
,case
when Address2 Is NULL   
OR Address2 = ''
THEN 1
End Add2filled
,case
when Address3 Is NULL   
OR Address3 = ''
THEN 1
End Add3filled
,case
when Postcode Is NULL   
OR Postcode = ''
THEN 1
End postcodefilled
,case
when City Is NULL   
OR City = ''
THEN 1
End Cityfilled
into #solution
from #test s order by ID
Select
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
sum(Add1filled)+sum(Add2filled)+sum(Add3filled) [Addfilled],
postcodefilled,
Cityfilled
from #solution
group by
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
postcodefilled,
Cityfilled

有人能帮忙吗?

有效,但不是一个明显的解决方案"计数";对于不同的数据类型,横向来说并不简单;您需要取消透视数据并将它们转换为所有相同的数据类型,然后COUNTNULL值。这就是APPLY正在做的事情。

然而,同样,在APPLY中,您需要计算出";"优先级";基于行的NULL值,2行具有相同数量的非NULL值。因此;最高";优先级值具有最高值。

然后,最后,我们使用";旧的";CCD_ 7方法与CCD_;过滤器";到";第一个";每组中的行:

SELECT TOP (1) WITH TIES
T.*
FROM #test T
CROSS APPLY (SELECT COUNT(N) AS NotNulls,
MAX(P) AS Priority
FROM (VALUES(CASE WHEN T.BusinessName IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Address1 IS NOT NULL THEN 1 END,CASE WHEN T.Address1 IS NOT NULL THEN 1 END),
(CASE WHEN T.Address2 IS NOT NULL THEN 1 END,CASE WHEN T.Address2 IS NOT NULL THEN 2 END),
(CASE WHEN T.Address3 IS NOT NULL THEN 1 END,CASE WHEN T.Address3 IS NOT NULL THEN 3 END),
(CASE WHEN T.Postcode IS NOT NULL THEN 1 END,CASE WHEN T.Postcode IS NOT NULL THEN 4 END),
(CASE WHEN T.City IS NOT NULL THEN 1 END,CASE WHEN T.City IS NOT NULL THEN 5 END),
(CASE WHEN T.Country IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Turnover IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Domain IS NOT NULL THEN 1 END,NULL))V(N,P))G
ORDER BY ROW_NUMBER() OVER (PARTITION BY T.ID ORDER BY G.NotNulls DESC, G.Priority DESC, BusinessName ASC);

使用CASE WHEN表达式检查NULL。使用row_number()确定每个Id的优先级。

with cte as
(
select  *,
rn  = row_number() over 
(
partition by Id
order by nulls, 
(case when Postcode is not null then 1 else 2 end),
(case when City is not null then 1 else 2 end)
)

from    #test t 
cross apply
(
select  nulls   = case when BusinessName is null then 1 else 0 end
+ case when Address1 is null then 1 else 0 end
+ case when Address2 is null then 1 else 0 end
+ case when Address3 is null then 1 else 0 end
+ case when Postcode is null then 1 else 0 end
+ case when City is null then 1 else 0 end
+ case when Country is null then 1 else 0 end
+ case when Turnover is null then 1 else 0 end
+ case when domain is null then 1 else 0 end
) n
)
select  *
from    cte 
where   rn  = 1

我假设有一个主键。这里有一个解决方案,可以扩展到n列,而不考虑数据类型:

WITH cte1 AS (
SELECT Pk
, Id
, count_all = COUNT(v)
, count_postcode = COUNT(CASE WHEN n = 'Postcode' AND v IS NOT NULL THEN 1 END)
, count_city = COUNT(CASE WHEN n = 'City' AND v IS NOT NULL THEN 1 END)
, count_address = COUNT(CASE WHEN n IN ('Address1', 'Address2', 'Address3') AND v IS NOT NULL THEN 1 END)
FROM #test
CROSS APPLY (values
('BusinessName', CASE WHEN BusinessName IS NOT NULL THEN 1 END),
('Address1',     CASE WHEN Address1     IS NOT NULL THEN 1 END),
('Address2',     CASE WHEN Address2     IS NOT NULL THEN 1 END),
('Address3',     CASE WHEN Address3     IS NOT NULL THEN 1 END),
('Postcode',     CASE WHEN Postcode     IS NOT NULL THEN 1 END),
('City',         CASE WHEN City         IS NOT NULL THEN 1 END),
('Country',      CASE WHEN Country      IS NOT NULL THEN 1 END),
('Turnover',     CASE WHEN Turnover     IS NOT NULL THEN 1 END),
('Domain',       CASE WHEN Domain       IS NOT NULL THEN 1 END) 
) AS x(n, v)
GROUP BY Pk, Id
), cte2 AS (
SELECT cte1.*
, rn = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY count_all DESC, count_postcode DESC, count_city DESC, count_address DESC)
FROM cte1
)
SELECT *
FROM cte2
WHERE rn= 1

它返回第2行(有邮政编码(和第4行(两个地址(。

我看不到使用apply的值。只需将top (1)order by:一起使用

select top (1) t.*
from #test
order by (case when BusinessName is not null then 1 else 0 end +
case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end +
case when Postcode is not null then 1 else 0 end +
case when City is not null then 1 else 0 end +
case when Country is not null then 1 else 0 end +
case when Turnover is not null then 1 else 0 end +
case when domain is not null then 1 else 0 end
) desc,
(case when Postcode is not null then 1 else 0 end) desc,
(case when City is not null then 1 else 0 end) desc,
(case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end
) desc;

所有的逻辑都在一个地方,不需要ORDER BY之外的额外处理。

相关内容

  • 没有找到相关文章

最新更新