在SQL Server中工作。事实上,我有一个多列(20+(、数千行和多种数据类型的数据。我没有创建函数或过程的权限,但我可以处理临时表。以下是数据的简化片段。
drop table if exists #test
create table #test
(
Id INT,
BusinessName varchar(30),
Address1 varchar(100),
Address2 varchar (100),
Address3 varchar (100),
Postcode varchar (100),
City varchar (100),
Country varchar (20),
Turnover dec(20,2),
domain varchar (100)
)
insert #test values
(1,'A','nr1','street1', 'Court1', null, null, 'GB', 1.1,'www@1'),
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(1,'A', null, null, 'Court1', null, 'Paris', 'FR', 1.3, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2'),
(2,'B','nr2a',null, null, 'M2 3DW', 'Manchester', 'GB', 2, 'www@2')
对于每个ID,我需要选择具有最少null数量的记录。
如果最少的null数量相同,那么我需要选择Postcode不为null的记录,
如果仍然有多个选择,我需要选择一个城市不为空,
如果仍然有多个选项,我需要选择一个在所有三个功能(地址1、地址2、地址3(中空值最少的选项
否则我可以选择top1。
答案应该是:
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2')
以下是我对每一行的null进行计数的尝试,但首先,地址null没有正确相加。此外,我不知道现在如何根据优先级进行选择:
Drop table if exists #solution
select s.*
,case
when Address1 Is NULL
OR Address1 = ''
THEN 1
End Add1filled
,case
when Address2 Is NULL
OR Address2 = ''
THEN 1
End Add2filled
,case
when Address3 Is NULL
OR Address3 = ''
THEN 1
End Add3filled
,case
when Postcode Is NULL
OR Postcode = ''
THEN 1
End postcodefilled
,case
when City Is NULL
OR City = ''
THEN 1
End Cityfilled
into #solution
from #test s order by ID
Select
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
sum(Add1filled)+sum(Add2filled)+sum(Add3filled) [Addfilled],
postcodefilled,
Cityfilled
from #solution
group by
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
postcodefilled,
Cityfilled
有人能帮忙吗?
这有效,但不是一个明显的解决方案"计数";对于不同的数据类型,横向来说并不简单;您需要取消透视数据并将它们转换为所有相同的数据类型,然后COUNT
非NULL
值。这就是APPLY
正在做的事情。
然而,同样,在APPLY
中,您需要计算出";"优先级";基于行的NULL
值,2行具有相同数量的非NULL
值。因此;最高";优先级值具有最高值。
然后,最后,我们使用";旧的";CCD_ 7方法与CCD_;过滤器";到";第一个";每组中的行:
SELECT TOP (1) WITH TIES
T.*
FROM #test T
CROSS APPLY (SELECT COUNT(N) AS NotNulls,
MAX(P) AS Priority
FROM (VALUES(CASE WHEN T.BusinessName IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Address1 IS NOT NULL THEN 1 END,CASE WHEN T.Address1 IS NOT NULL THEN 1 END),
(CASE WHEN T.Address2 IS NOT NULL THEN 1 END,CASE WHEN T.Address2 IS NOT NULL THEN 2 END),
(CASE WHEN T.Address3 IS NOT NULL THEN 1 END,CASE WHEN T.Address3 IS NOT NULL THEN 3 END),
(CASE WHEN T.Postcode IS NOT NULL THEN 1 END,CASE WHEN T.Postcode IS NOT NULL THEN 4 END),
(CASE WHEN T.City IS NOT NULL THEN 1 END,CASE WHEN T.City IS NOT NULL THEN 5 END),
(CASE WHEN T.Country IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Turnover IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Domain IS NOT NULL THEN 1 END,NULL))V(N,P))G
ORDER BY ROW_NUMBER() OVER (PARTITION BY T.ID ORDER BY G.NotNulls DESC, G.Priority DESC, BusinessName ASC);
使用CASE WHEN
表达式检查NULL
。使用row_number()
确定每个Id
的优先级。
with cte as
(
select *,
rn = row_number() over
(
partition by Id
order by nulls,
(case when Postcode is not null then 1 else 2 end),
(case when City is not null then 1 else 2 end)
)
from #test t
cross apply
(
select nulls = case when BusinessName is null then 1 else 0 end
+ case when Address1 is null then 1 else 0 end
+ case when Address2 is null then 1 else 0 end
+ case when Address3 is null then 1 else 0 end
+ case when Postcode is null then 1 else 0 end
+ case when City is null then 1 else 0 end
+ case when Country is null then 1 else 0 end
+ case when Turnover is null then 1 else 0 end
+ case when domain is null then 1 else 0 end
) n
)
select *
from cte
where rn = 1
我假设有一个主键。这里有一个解决方案,可以扩展到n列,而不考虑数据类型:
WITH cte1 AS (
SELECT Pk
, Id
, count_all = COUNT(v)
, count_postcode = COUNT(CASE WHEN n = 'Postcode' AND v IS NOT NULL THEN 1 END)
, count_city = COUNT(CASE WHEN n = 'City' AND v IS NOT NULL THEN 1 END)
, count_address = COUNT(CASE WHEN n IN ('Address1', 'Address2', 'Address3') AND v IS NOT NULL THEN 1 END)
FROM #test
CROSS APPLY (values
('BusinessName', CASE WHEN BusinessName IS NOT NULL THEN 1 END),
('Address1', CASE WHEN Address1 IS NOT NULL THEN 1 END),
('Address2', CASE WHEN Address2 IS NOT NULL THEN 1 END),
('Address3', CASE WHEN Address3 IS NOT NULL THEN 1 END),
('Postcode', CASE WHEN Postcode IS NOT NULL THEN 1 END),
('City', CASE WHEN City IS NOT NULL THEN 1 END),
('Country', CASE WHEN Country IS NOT NULL THEN 1 END),
('Turnover', CASE WHEN Turnover IS NOT NULL THEN 1 END),
('Domain', CASE WHEN Domain IS NOT NULL THEN 1 END)
) AS x(n, v)
GROUP BY Pk, Id
), cte2 AS (
SELECT cte1.*
, rn = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY count_all DESC, count_postcode DESC, count_city DESC, count_address DESC)
FROM cte1
)
SELECT *
FROM cte2
WHERE rn= 1
它返回第2行(有邮政编码(和第4行(两个地址(。
我看不到使用apply
的值。只需将top (1)
与order by
:一起使用
select top (1) t.*
from #test
order by (case when BusinessName is not null then 1 else 0 end +
case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end +
case when Postcode is not null then 1 else 0 end +
case when City is not null then 1 else 0 end +
case when Country is not null then 1 else 0 end +
case when Turnover is not null then 1 else 0 end +
case when domain is not null then 1 else 0 end
) desc,
(case when Postcode is not null then 1 else 0 end) desc,
(case when City is not null then 1 else 0 end) desc,
(case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end
) desc;
所有的逻辑都在一个地方,不需要ORDER BY
之外的额外处理。