返回ROW_NUMBER()中每个字段的非空值



考虑以下数据:

CREATE TABLE #FakeTable (
PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int
)
INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, 'fake@noemail.com', NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', 'fake@noemail.com', '123 Fake Street', 9),
('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
('7891011', 'Bob Weir', NULL, NULL, 'nope@fake.com', '456 Nope Street', 10),
('7891011', 'Bob Weir', '19471610', NULL, 'nope@fake.com', '456 Nope Street', 6)

然后是:

PersonID  PatientNM     BirthDTS       PhoneNBR      EmailAddress        StreetAddress    KeyFieldCNT
123456    Jerry Garcia   NULL          NULL          fake@noemail.com    NULL               11
123456    Jerry Garcia   19421401      999-999-9999  NULL                NULL               11
123456    Jerry Garcia   19421401      999-999-9999  fake@noemail.com    123 Fake Street    9
7891011   Bob Weir       19471610      NULL          NULL                456 Nope Street    11
7891011   Bob Weir       NULL          NULL          nope@fake.com       456 Nope Street    10
7891011   Bob Weir       19471610      NULL          nope@fake.com       456 Nope Street    6

我想使用ROW_NUMBER()并按PersonID分区,按KeyFieldCNT排序,以便建立一个排名。我希望排名最高的行只返回每个列的非空值,如果该列恰好在另一个排名较低的行中填充。

使用普通的ROW_NUMBER()和分区/order by,结果如下:

WITH CTE AS (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT DESC) AS RowNBR
FROM #FakeTable
)
SELECT *
FROM CTE
WHERE RowNBR = '1'

结果:

PersonID  PatientNM     BirthDTS  PhoneNBR       EmailAddress      StreetAddress      KeyFieldCNT   RowNBR
123456    Jerry Garcia  NULL      NULL           fake@noemail.com    NULL               11          1
7891011   Bob Weir      19471610  NULL           NULL                456 Nope Street    11          1

我想要的结果是:

PersonID  PatientNM     BirthDTS  PhoneNBR       EmailAddress      StreetAddress      KeyFieldCNT   RowNBR
123456    Jerry Garcia  19421401  999-999-9999   fake@noemail.com  123 Fake Street    11            1
7891011   Bob Weir      19471610  NULL           nope@fake.com     456 Nope Street    11            1

有人知道怎么做吗?我曾尝试以此为例,但当我推断到不止一个感兴趣的字段时,它并没有给我我需要的东西(SQL -在RANK()或ROW_NUMBER()之后返回第一个非NULL值)。

你可以试试这个:

CREATE TABLE #FakeTable (
PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int
)
INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, 'fake@noemail.com', NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', 'fake@noemail.com', '123 Fake Street', 9),
('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
('7891011', 'Bob Weir', NULL, NULL, 'nope@fake.com', '456 Nope Street', 10),
('7891011', 'Bob Weir', '19471610', NULL, 'nope@fake.com', '456 Nope Street', 6)
SELECT ft.PersonID,
ft.PatientNM,
MAX(ft.BirthDTS) AS BirthDTS,
MAX(ft.PhoneNBR) AS PhoneNBR,
MAX(ft.EmailAddress) AS EmailAddress,
MAX(ft.StreetAddress) AS StreetAddress,
MAX(ft.KeyFieldCNT) AS  KeyFieldCNT
FROM #FakeTable AS ft
GROUP BY PersonID, ft.PatientNM

你有问题。当使用row_number()进行过滤时,所有列的值将来自同一行. 这实际上是一个特性;这是一个好东西,主要原因是我一直在慢慢地用row_number()代替许多旧的GROUP BY代码。

但是对于人员ID 123456,您希望看到PhoneNBREmailAddress的非null数据,并且还显示KeyFieldCNT的最高值(11)。

表中没有这样的行

KeyFieldCount = 11的person 123456的两行都缺少一个或另一个字段。

我怀疑KeyFieldCnt是由应用程序维护的,以显示填充了多少特定字段,而您希望显示每个用户的数据最多的行。不幸的是,该行缺少了您所关心的一个字段,即使该行有更多的其他字段。

有办法解决这个问题,但如果你需要能够合并数据与多个可能缺失的字段,你将不得不再次连接回表的每个可能的字段…可能相当缓慢和冗长。相反,这似乎是一个数据问题。表没有将所有数据存储在您期望的所有位置,更好的解决方案是查找并恢复/修复表中丢失的数据。

假设PatientID,PatientNMKeyFieldCNT不能为NULL,您可以尝试:

WITH CTE AS (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT) AS RowNBR
FROM FakeTable
)
SELECT CTE.PersonID, CTE.PatientNM, 
coalesce(CTE.BirthDTS, max(ft.BirthDTS)) BirthDTS, 
coalesce(CTE.PhoneNBR, max(ft.PhoneNBR)) PhoneNBR,
coalesce(CTE.EmailAddress, max(ft.EmailAddress)) EmailAddress,
coalesce(CTE.StreetAddress, max(ft.StreetAddress)) StreetAddress,
CTE.KeyFieldCNT
FROM CTE
LEFT JOIN FakeTable ft ON ft.PersonID = CTE.PersonID
WHERE RowNBR = 1
GROUP BY CTE.PersonID, CTE.PatientNM, CTE.BirthDTS, CTE.PhoneNBR,
CTE.EmailAddress, CTE.StreetAddress, CTE.KeyFieldCNT

看这里:

https://dbfiddle.uk/Nqz61kul

只是要注意:这是混合来自不同行的值,这几乎总是一个坏主意,并且没有有意义的选择,哪一行有优先级来填充缺失的数据。

A更正确方法涉及对您关心的每个字段进行额外的扫描,因此您可以指定优先级和其他标准来填充缺失的值。但是要长得多慢得要命

我尝试用

单独选择每一列
SELECT TOP 1 <col> FROM ... WHERE <col> IS NOT NULL ORDER BY ... DESC

,它似乎奏效了,如下所示:

;WITH X (PersonID) AS (
SELECT DISTINCT PersonID FROM #FakeTable
)
SELECT X.PersonID
, (SELECT TOP 1 F.PatientNM     FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND   F.PatientNM     IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
AS     PatientNM
, (SELECT TOP 1 F.BirthDTS      FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND   F.BirthDTS      IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
AS     BirthDTS
, (SELECT TOP 1 F.PhoneNBR      FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND   F.PhoneNBR      IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
AS     PhoneNBR
, (SELECT TOP 1 F.EmailAddress  FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND   F.EmailAddress  IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
AS     EmailAddress
, (SELECT TOP 1 F.StreetAddress FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND   F.StreetAddress IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
AS     StreetAddress
, (SELECT TOP 1 F.KeyFieldCNT   FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND   F.KeyFieldCNT   IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
AS     KeyFieldCNT
FROM X

结果:

tbody> <<tr>
PersonIDPatientNMBirthDTSPhoneNBREmailAddressStreetAddressKeyFieldCNT
123456Jerry Garcia19421401999-999-9999fake@noemail.com假街123号11
7891011鲍勃堰19471610nope@fake.com不街456号11

当您想要/需要所有(或大多数)非null值在顶部的记录时,您可以这样做:

select *,
IIF(PersonId is null,0,1) +
IIF(PatientNM is null,0,1) +
IIF(BirthDTS is null,0,1) +
IIF(PhoneNBR is null,0,1) +
IIF(EmailAddress is null,0,1) +
IIF(StreetAddress is null,0,1) +
IIF(KeyFieldCNT is null,0,1) ColumnsWithData
from #FakeTable
ORDER BY 
IIF(PersonId is null,0,1) +
IIF(PatientNM is null,0,1) +
IIF(BirthDTS is null,0,1) +
IIF(PhoneNBR is null,0,1) +
IIF(EmailAddress is null,0,1) +
IIF(StreetAddress is null,0,1) +
IIF(KeyFieldCNT is null,0,1)     DESC
;

看到:DBFIDDLE

相关内容

  • 没有找到相关文章

最新更新