考虑以下数据:
CREATE TABLE #FakeTable (
PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int
)
INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, 'fake@noemail.com', NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', 'fake@noemail.com', '123 Fake Street', 9),
('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
('7891011', 'Bob Weir', NULL, NULL, 'nope@fake.com', '456 Nope Street', 10),
('7891011', 'Bob Weir', '19471610', NULL, 'nope@fake.com', '456 Nope Street', 6)
然后是:
PersonID PatientNM BirthDTS PhoneNBR EmailAddress StreetAddress KeyFieldCNT
123456 Jerry Garcia NULL NULL fake@noemail.com NULL 11
123456 Jerry Garcia 19421401 999-999-9999 NULL NULL 11
123456 Jerry Garcia 19421401 999-999-9999 fake@noemail.com 123 Fake Street 9
7891011 Bob Weir 19471610 NULL NULL 456 Nope Street 11
7891011 Bob Weir NULL NULL nope@fake.com 456 Nope Street 10
7891011 Bob Weir 19471610 NULL nope@fake.com 456 Nope Street 6
我想使用ROW_NUMBER()并按PersonID分区,按KeyFieldCNT排序,以便建立一个排名。我希望排名最高的行只返回每个列的非空值,如果该列恰好在另一个排名较低的行中填充。
使用普通的ROW_NUMBER()和分区/order by,结果如下:
WITH CTE AS (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT DESC) AS RowNBR
FROM #FakeTable
)
SELECT *
FROM CTE
WHERE RowNBR = '1'
结果:
PersonID PatientNM BirthDTS PhoneNBR EmailAddress StreetAddress KeyFieldCNT RowNBR
123456 Jerry Garcia NULL NULL fake@noemail.com NULL 11 1
7891011 Bob Weir 19471610 NULL NULL 456 Nope Street 11 1
我想要的结果是:
PersonID PatientNM BirthDTS PhoneNBR EmailAddress StreetAddress KeyFieldCNT RowNBR
123456 Jerry Garcia 19421401 999-999-9999 fake@noemail.com 123 Fake Street 11 1
7891011 Bob Weir 19471610 NULL nope@fake.com 456 Nope Street 11 1
有人知道怎么做吗?我曾尝试以此为例,但当我推断到不止一个感兴趣的字段时,它并没有给我我需要的东西(SQL -在RANK()或ROW_NUMBER()之后返回第一个非NULL值)。
你可以试试这个:
CREATE TABLE #FakeTable (
PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int
)
INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, 'fake@noemail.com', NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', 'fake@noemail.com', '123 Fake Street', 9),
('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
('7891011', 'Bob Weir', NULL, NULL, 'nope@fake.com', '456 Nope Street', 10),
('7891011', 'Bob Weir', '19471610', NULL, 'nope@fake.com', '456 Nope Street', 6)
SELECT ft.PersonID,
ft.PatientNM,
MAX(ft.BirthDTS) AS BirthDTS,
MAX(ft.PhoneNBR) AS PhoneNBR,
MAX(ft.EmailAddress) AS EmailAddress,
MAX(ft.StreetAddress) AS StreetAddress,
MAX(ft.KeyFieldCNT) AS KeyFieldCNT
FROM #FakeTable AS ft
GROUP BY PersonID, ft.PatientNM
你有问题。当使用row_number()
进行过滤时,所有列的值将来自同一行. 这实际上是一个特性;这是一个好东西,主要原因是我一直在慢慢地用row_number()
代替许多旧的GROUP BY
代码。
但是对于人员ID 123456,您希望看到PhoneNBR
和EmailAddress
的非null数据,并且还显示KeyFieldCNT
的最高值(11
)。
表中没有这样的行
KeyFieldCount = 11的person 123456的两行都缺少一个或另一个字段。
我怀疑KeyFieldCnt
是由应用程序维护的,以显示填充了多少特定字段,而您希望显示每个用户的数据最多的行。不幸的是,该行缺少了您所关心的一个字段,即使该行有更多的其他字段。
有办法解决这个问题,但如果你需要能够合并数据与多个可能缺失的字段,你将不得不再次连接回表的每个可能的字段…可能相当缓慢和冗长。相反,这似乎是一个数据问题。表没有将所有数据存储在您期望的所有位置,更好的解决方案是查找并恢复/修复表中丢失的数据。
假设PatientID
,PatientNM
和KeyFieldCNT
不能为NULL,您可以尝试:
WITH CTE AS (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT) AS RowNBR
FROM FakeTable
)
SELECT CTE.PersonID, CTE.PatientNM,
coalesce(CTE.BirthDTS, max(ft.BirthDTS)) BirthDTS,
coalesce(CTE.PhoneNBR, max(ft.PhoneNBR)) PhoneNBR,
coalesce(CTE.EmailAddress, max(ft.EmailAddress)) EmailAddress,
coalesce(CTE.StreetAddress, max(ft.StreetAddress)) StreetAddress,
CTE.KeyFieldCNT
FROM CTE
LEFT JOIN FakeTable ft ON ft.PersonID = CTE.PersonID
WHERE RowNBR = 1
GROUP BY CTE.PersonID, CTE.PatientNM, CTE.BirthDTS, CTE.PhoneNBR,
CTE.EmailAddress, CTE.StreetAddress, CTE.KeyFieldCNT
看这里:
https://dbfiddle.uk/Nqz61kul只是要注意:这是混合来自不同行的值,这几乎总是一个坏主意,并且没有有意义的选择,哪一行有优先级来填充缺失的数据。
A更正确方法涉及对您关心的每个字段进行额外的扫描,因此您可以指定优先级和其他标准来填充缺失的值。但是要长得多慢得要命
我尝试用
单独选择每一列SELECT TOP 1 <col> FROM ... WHERE <col> IS NOT NULL ORDER BY ... DESC
,它似乎奏效了,如下所示:
;WITH X (PersonID) AS (
SELECT DISTINCT PersonID FROM #FakeTable
)
SELECT X.PersonID
, (SELECT TOP 1 F.PatientNM FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.PatientNM IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS PatientNM
, (SELECT TOP 1 F.BirthDTS FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.BirthDTS IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS BirthDTS
, (SELECT TOP 1 F.PhoneNBR FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.PhoneNBR IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS PhoneNBR
, (SELECT TOP 1 F.EmailAddress FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.EmailAddress IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS EmailAddress
, (SELECT TOP 1 F.StreetAddress FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.StreetAddress IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS StreetAddress
, (SELECT TOP 1 F.KeyFieldCNT FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.KeyFieldCNT IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS KeyFieldCNT
FROM X
结果:
PersonID | PatientNM | BirthDTS | PhoneNBR | EmailAddress | StreetAddress | KeyFieldCNT | 123456 | Jerry Garcia | 19421401 | 999-999-9999 | fake@noemail.com | 假街123号 | 11 |
---|---|---|---|---|---|---|
7891011 | 鲍勃堰 | 19471610 | 空 | nope@fake.com | 不街456号 | 11 |
当您想要/需要所有(或大多数)非null值在顶部的记录时,您可以这样做:
select *,
IIF(PersonId is null,0,1) +
IIF(PatientNM is null,0,1) +
IIF(BirthDTS is null,0,1) +
IIF(PhoneNBR is null,0,1) +
IIF(EmailAddress is null,0,1) +
IIF(StreetAddress is null,0,1) +
IIF(KeyFieldCNT is null,0,1) ColumnsWithData
from #FakeTable
ORDER BY
IIF(PersonId is null,0,1) +
IIF(PatientNM is null,0,1) +
IIF(BirthDTS is null,0,1) +
IIF(PhoneNBR is null,0,1) +
IIF(EmailAddress is null,0,1) +
IIF(StreetAddress is null,0,1) +
IIF(KeyFieldCNT is null,0,1) DESC
;
看到:DBFIDDLE