检索行间特定列不同的行



我有以下SQL来获取最高(最大)和第二高的日期行。在某些情况下,"旧行"和"新行"中的唯一区别是生效日期。我想做的是只显示每个 EMPLID 的新旧行,其中"旧"和"新"行之间的BANK_CD和/或ACCOUNT_NUM不同。

WITH CTE AS ( 
SELECT A.EMPLID 
, C.VENDOR_ID
, B.FIRST_NAME 
, B.LAST_NAME 
, A.BANK_CD 
, A.ACCOUNT_NUM 
, A.ACCOUNT_TYPE 
, A.PRIORITY
, A.LAST_UPDATE_DATE 
, A.EFFDT 
, MAX(A.LAST_UPDATE_DATE) OVER(PARTITION BY A.EMPLID) AS MAX_UPDATE_DATE 
, ROW_NUMBER() OVER(PARTITION BY A.EMPLID 
ORDER BY A.EFFDT DESC 
, D.EFFDT DESC) AS RN 
FROM PS_DIRECT_DEPOSIT D INNER JOIN PS_DIR_DEP_DISTRIB A ON A.EMPLID = D.EMPLID 
AND A.EFFDT = D.EFFDT INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID INNER JOIN PS_GHS_DIR_DEP_VND C ON C.EMPLID = A.EMPLID 
INNER JOIN PS_DIR_DEP_DISTRIB E ON E.EMPLID = A.EMPLID AND E.EFFDT = A.EFFDT AND E.PRIORITY = A.PRIORITY 
WHERE B.EMPL_STATUS NOT IN ('T','R','D') 
AND ((A.DEPOSIT_TYPE = 'P' 
AND A.AMOUNT_PCT = 100) 
OR A.PRIORITY = 999 
OR A.DEPOSIT_TYPE = 'B') 
AND D.EFF_STATUS = 'A' )
SELECT CASE WHEN RN = 1 THEN 'NEW ROW' WHEN RN = 2 THEN 'OLD ROW' END AS 
'ROW_TYPE' , * 
FROM CTE 
WHERE RN IN (1, 2) 
AND MAX_UPDATE_DATE >= GETDATE() - 8

这是我现在得到的输出示例:

ROW_TYPE    EMPLID     VENDOR_ID    FIRST_NAME    LAST_NAME    BANK_CD    ACCOUNT_NUM    ACCOUNT_TYPE    PRIORITY    LAST_UPDATE_DATE    EFFDT       MAX_UPDATE_DATE    RN
NEW ROW     12345      XYZ123       John          Smith        111111122  45678          C               999         03/12/2019          03/12/2019  03/12/2019         1
OLD ROW     12345      XYZ123       John          Smith        111111122  45678          C               999         10/25/2017          10/25/2017  10/25/2017         2
NEW ROW     47831      A86464       Samm          Bulle        754566654  98865          C               999         03/12/2019          06/08/2018  03/12/2019         1
OLD ROW     47831      A86464       Samm          Bulle        754566654  45678          C               999         10/25/2017          06/08/2018  10/25/2017         2
NEW ROW     32456      KG4561       Kilo          Renne        875123311  32146          C               300         09/02/2018          09/02/2018  09/02/2018         1
OLD ROW     32456      KG4561       Kilo          Renne        971215477  78131          C               310         12/21/2017          12/21/2017  12/21/2017         2

在上面的例子中,我不想输出前两组 EMPLID 行(前 4 行),因为它们在新旧行之间具有相同的BANK_CD和ACCOUNT_NUM。我想保留在我的输出中的最后一组行,因为您可以看到BANK_CD和ACCOUNT_NUM是不同的(可以包含其中)。

我尝试为PS_DIR_DEP_DISTRIB添加一个自连接,然后在 WHERE 子句中添加 AND(A.BANK_CD <> E.BANK_CD OR A.ACCOUNT_NUM <> E.ACCOUNT_NUM),如下所示:

WITH CTE AS ( 
SELECT A.EMPLID 
--, C.VENDOR_ID
, B.FIRST_NAME 
, B.LAST_NAME 
, A.BANK_CD 
, A.ACCOUNT_NUM 
, A.ACCOUNT_TYPE 
, A.PRIORITY
, A.LAST_UPDATE_DATE 
, A.EFFDT 
, MAX(A.LAST_UPDATE_DATE) OVER(PARTITION BY A.EMPLID) AS MAX_UPDATE_DATE 
, ROW_NUMBER() OVER(PARTITION BY A.EMPLID 
ORDER BY A.EFFDT DESC 
, D.EFFDT DESC) AS RN 
FROM PS_DIRECT_DEPOSIT D INNER JOIN PS_DIR_DEP_DISTRIB A ON A.EMPLID = D.EMPLID 
AND A.EFFDT = D.EFFDT INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID INNER JOIN PS_GHS_DIR_DEP_VND C ON C.EMPLID = A.EMPLID 
---ADDING SELF JOIN BELOW---
INNER JOIN PS_DIR_DEP_DISTRIB E ON E.EMPLID = A.EMPLID AND E.EFFDT = 
A.EFFDT AND E.PRIORITY = A.PRIORITY 
WHERE B.EMPL_STATUS NOT IN ('T','R','D') 
AND ((A.DEPOSIT_TYPE = 'P' 
AND A.AMOUNT_PCT = 100) 
OR A.PRIORITY = 999 
OR A.DEPOSIT_TYPE = 'B') 
AND D.EFF_STATUS = 'A' 
---ADDING NEW WHERE CONDITION BELOW---
AND (E.ACCOUNT_NUM <> A.ACCOUNT_NUM OR E.BANK_CD <> A.BANK_CD )) 
SELECT CASE WHEN RN = 1 THEN 'NEW ROW' WHEN RN = 2 THEN 'OLD ROW' END AS 'ROW_TYPE' 
, * 
FROM CTE 
WHERE RN IN (1, 2) 
AND MAX_UPDATE_DATE >= GETDATE() - 8

但是,当我进行上述更改时,我根本没有返回任何数据。我做错了什么?

19-3-20 编辑:

按照 Tarek 在下面建议的答案,我似乎缺少一些我希望在最终输出中的行。似乎BANK_CD_prevACCOUNT_NUM_prev添加的新字段在我希望显示它们的行中返回NULL值,因此最终查询没有检索它们。如何处理?

ROW_TYPE    EMPLID    FIRST_NAME    LAST_NAME   BANK_CD       ACCOUNT_NUM     ACCOUNT_TYPE  PRIORITY    LAST_UPDATE_DATE       EFFDT        MAX_UPDATE_DATE  RN    BANK_CD_prev   ACCOUNT_NUM_prev  
NEW ROW     56789     Test          User        874556411     54765           C             999         2019-02-28             2019-02-28   2019-02-28       1     NULL           NULL
OLD ROW     56789     Test          User        874556411     98451           C             999         2017-10-09             2017-10-09   2017-10-09       1     874556411      54765
NEW ROW     56789     Sampl         Test        756561623     46331           C             999         2018-03-12             2018-03-12   2018-03-12       1     NULL           NULL
OLD ROW     56789     Test          User        756561623     46331           C             999         2015-05-18             2015-05-18   2015-05-18       1     756561623      46331

请注意,上面的内容在没有(AND BANK_CD <> BANK_CD_prev AND ACCOUNT_NUM <> ACCOUNT_NUM_prev的最终条件的情况下运行查询,这样我就可以明白为什么我缺少一些行。在这种情况下,您可以看到第一EMPLID的两行,其中ACCOUNT_NUM不同,因此我应该输出两行。第 2EMPLID(最后两行)略有不同,因为两行的BANK_CDACCOUNT_NUM相同,所以我不希望这是输出(事实并非如此,所以这是正确的),但仍然显示NULL,如第一行。

19-3-20 编辑2:你可以看到BANK_CD_COUNTACCOUNT_NUM_COUNT里面有一些奇怪的数字,我认为这导致了问题。

ROW_TYPE    EMPLID  FIRST_NAME  LAST_NAME   BANK_CD     ACCOUNT_NUM  ACCOUNT_TYPE   PRIORITY    LAST_UPDATE_DATE    EFFDT        MAX_UPDATE_DATE    RN   BANK_CD_COUNT  ACCOUNT_NUM_COUNT
NEW ROW     812682  Test        User        031308302   8675309      C              999         2019-03-09          2019-03-09   2019-03-09          1   1101            1
OLD ROW     812682  Test        User        231379393   0001236408   C           999            2018-04-11          2018-03-03   2019-03-09          2   476            1

此链接包含"创建表"和"插入"脚本,供任何可以帮助复制此脚本的人使用。

https://pastebin.com/cZLhMmet

一种方法是使用LEADLAG函数。我们在这里需要两者。

这是您的原始查询,我按日期格式化并注释掉了过滤器。

WITH 
CTE
AS 
(
SELECT
A.EMPLID
,C.VENDOR_ID
,B.FIRST_NAME
,B.LAST_NAME
,A.BANK_CD
,A.ACCOUNT_NUM
,A.ACCOUNT_TYPE
,A.PRIORITY
,A.LAST_UPDATE_DATE
,A.EFFDT
,MAX(A.LAST_UPDATE_DATE) OVER (PARTITION BY A.EMPLID) AS MAX_UPDATE_DATE
,ROW_NUMBER() OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC, D.EFFDT DESC) AS RN
FROM 
PS_DIRECT_DEPOSIT D
INNER JOIN PS_DIR_DEP_DISTRIB A 
ON A.EMPLID = D.EMPLID
AND A.EFFDT = D.EFFDT
INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID
INNER JOIN PS_GHS_DIR_DEP_VND C ON C.EMPLID = A.EMPLID
INNER JOIN PS_DIR_DEP_DISTRIB E 
ON E.EMPLID = A.EMPLID
AND E.EFFDT = A.EFFDT
AND E.PRIORITY = A.PRIORITY
WHERE 
B.EMPL_STATUS NOT IN ('T','R','D')
AND 
(
(
A.DEPOSIT_TYPE = 'P'
AND A.AMOUNT_PCT = 100
)
OR A.PRIORITY = 999
OR A.DEPOSIT_TYPE = 'B'
)
AND D.EFF_STATUS = 'A'
)
SELECT 
CASE 
WHEN RN = 1 THEN 'NEW ROW'
WHEN RN = 2 THEN 'OLD ROW'
END AS 'ROW_TYPE'
,*
FROM CTE
WHERE 
RN IN (1,2)
--AND MAX_UPDATE_DATE >= GETDATE() - 8
;

此查询根据您的示例数据返回以下结果:

+----------+---------+-----------+------------+-----------+----------+-------------+--------------+----------+-------------------------+-------------------------+-------------------------+----+
| ROW_TYPE | EMPLID  | VENDOR_ID | FIRST_NAME | LAST_NAME | BANK_CD  | ACCOUNT_NUM | ACCOUNT_TYPE | PRIORITY |    LAST_UPDATE_DATE     |          EFFDT          |     MAX_UPDATE_DATE     | RN |
+----------+---------+-----------+------------+-----------+----------+-------------+--------------+----------+-------------------------+-------------------------+-------------------------+----+
| NEW ROW  |  097432 | 3471B     | H          | Adam      | 09146115 |    13404891 | C            |      999 | 2016-06-10 00:00:00.000 | 2016-06-10 00:00:00.000 | 2016-06-10 00:00:00.000 |  1 |
| OLD ROW  |  097432 | 3471B     | H          | Adam      | 09146115 |    13404891 | C            |      999 | 2016-06-10 00:00:00.000 | 2014-11-05 00:00:00.000 | 2016-06-10 00:00:00.000 |  2 |
| NEW ROW  |  100765 | 1272B     | V          | Milo      |  2358414 |      040925 | S            |      310 | 2014-03-05 00:00:00.000 | 2011-04-27 00:00:00.000 | 2014-03-05 00:00:00.000 |  1 |
| OLD ROW  |  100765 | 1272B     | V          | Milo      |  2358414 |      040925 | S            |      300 | 2014-03-05 00:00:00.000 | 2001-08-23 00:00:00.000 | 2014-03-05 00:00:00.000 |  2 |
| NEW ROW  | 1045632 | 3870A     | V          | Olo       |  2345794 |   179410860 | C            |      999 | 2018-08-16 00:00:00.000 | 2018-08-16 00:00:00.000 | 2018-08-16 00:00:00.000 |  1 |
| OLD ROW  | 1045632 | 3870A     | V          | Olo       |  2345794 |   179410860 | C            |      310 | 2011-02-25 00:00:00.000 | 2011-02-25 00:00:00.000 | 2018-08-16 00:00:00.000 |  2 |
| NEW ROW  |   21345 | 12345A    | J          | Smith     |  0224547 |    59167824 | C            |      999 | 2019-02-28 00:00:00.000 | 2019-02-28 00:00:00.000 | 2019-02-28 00:00:00.000 |  1 |
| OLD ROW  |   21345 | 12345A    | J          | Smith     |  0224547 |      591678 | C            |      999 | 2017-11-08 00:00:00.000 | 2017-10-17 00:00:00.000 | 2019-02-28 00:00:00.000 |  2 |
| NEW ROW  |   26110 | 1272B     | S          | Sams      |  8208302 |      822328 | C            |      999 | 2019-02-08 00:00:00.000 | 2019-02-08 00:00:00.000 | 2019-02-08 00:00:00.000 |  1 |
| OLD ROW  |   26110 | 1272B     | S          | Sams      |  8208302 |      822328 | C            |      315 | 2014-03-05 00:00:00.000 | 2012-07-30 00:00:00.000 | 2019-02-08 00:00:00.000 |  2 |
+----------+---------+-----------+------------+-----------+----------+-------------+--------------+----------+-------------------------+-------------------------+-------------------------+----+

我把它包括在这里,所以你可以看到下面的区别。

这是添加了LEADLAG函数的新查询(我们需要获取上一行和下一行),并在WHERE子句中带有额外的过滤器。

LEADLAG到达窗口末尾时返回 NULL。例如,LAG(prev 值)将为窗口的第一行返回 NULL,因为还没有"上一个"行。

因此,对于第一行 (rn=1),我们需要将其值与"next"进行比较。

对于第二行 (rn=2),我们需要将其值与 "prev" 进行比较。

最终查询

WITH 
CTE
AS 
(
SELECT
A.EMPLID
,C.VENDOR_ID
,B.FIRST_NAME
,B.LAST_NAME
,A.BANK_CD
,A.ACCOUNT_NUM
,A.ACCOUNT_TYPE
,A.PRIORITY
,A.LAST_UPDATE_DATE
,A.EFFDT
,MAX(A.LAST_UPDATE_DATE) OVER (PARTITION BY A.EMPLID) AS MAX_UPDATE_DATE
,ROW_NUMBER() OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC, D.EFFDT DESC) AS RN
,LAG(A.BANK_CD) OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC, D.EFFDT DESC) AS BANK_CD_prev
,LEAD(A.BANK_CD) OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC, D.EFFDT DESC) AS BANK_CD_next
,LAG(A.ACCOUNT_NUM) OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC, D.EFFDT DESC) AS ACCOUNT_NUM_prev
,LEAD(A.ACCOUNT_NUM) OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC, D.EFFDT DESC) AS ACCOUNT_NUM_next
FROM 
PS_DIRECT_DEPOSIT D
INNER JOIN PS_DIR_DEP_DISTRIB A 
ON A.EMPLID = D.EMPLID
AND A.EFFDT = D.EFFDT
INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID
INNER JOIN PS_GHS_DIR_DEP_VND C ON C.EMPLID = A.EMPLID
INNER JOIN PS_DIR_DEP_DISTRIB E 
ON E.EMPLID = A.EMPLID
AND E.EFFDT = A.EFFDT
AND E.PRIORITY = A.PRIORITY
WHERE 
B.EMPL_STATUS NOT IN ('T','R','D')
AND 
(
(
A.DEPOSIT_TYPE = 'P'
AND A.AMOUNT_PCT = 100
)
OR A.PRIORITY = 999
OR A.DEPOSIT_TYPE = 'B'
)
AND D.EFF_STATUS = 'A'
)
SELECT 
CASE 
WHEN RN = 1 THEN 'NEW ROW'
WHEN RN = 2 THEN 'OLD ROW'
END AS 'ROW_TYPE'
,*
FROM CTE
WHERE 
RN IN (1,2)
AND 
(
(
rn = 1
AND BANK_CD <> BANK_CD_next
)
OR
(
rn = 1
AND ACCOUNT_NUM <> ACCOUNT_NUM_next
)
OR
(
rn = 2
AND BANK_CD <> BANK_CD_prev
)
OR
(
rn = 2
AND ACCOUNT_NUM <> ACCOUNT_NUM_prev
)
)
--AND MAX_UPDATE_DATE >= GETDATE() - 8
;

最终结果

+----------+--------+-----------+------------+-----------+---------+-------------+--------------+----------+-------------------------+-------------------------+-------------------------+----+--------------+--------------+------------------+------------------+
| ROW_TYPE | EMPLID | VENDOR_ID | FIRST_NAME | LAST_NAME | BANK_CD | ACCOUNT_NUM | ACCOUNT_TYPE | PRIORITY |    LAST_UPDATE_DATE     |          EFFDT          |     MAX_UPDATE_DATE     | RN | BANK_CD_prev | BANK_CD_next | ACCOUNT_NUM_prev | ACCOUNT_NUM_next |
+----------+--------+-----------+------------+-----------+---------+-------------+--------------+----------+-------------------------+-------------------------+-------------------------+----+--------------+--------------+------------------+------------------+
| NEW ROW  |  21345 | 12345A    | J          | Smith     | 0224547 |    59167824 | C            |      999 | 2019-02-28 00:00:00.000 | 2019-02-28 00:00:00.000 | 2019-02-28 00:00:00.000 |  1 | NULL         | 0224547      | NULL             | 591678           |
| OLD ROW  |  21345 | 12345A    | J          | Smith     | 0224547 |      591678 | C            |      999 | 2017-11-08 00:00:00.000 | 2017-10-17 00:00:00.000 | 2019-02-28 00:00:00.000 |  2 | 0224547      | NULL         | 59167824         | NULL             |
+----------+--------+-----------+------------+-----------+---------+-------------+--------------+----------+-------------------------+-------------------------+-------------------------+----+--------------+--------------+------------------+------------------+

我只想将以下分析函数添加到您的 CTE 中以返回前置函数:

LAG(A.BANK_CD) OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC) AS BANK_CD_prev,
LAG(A.ACCOUNT_NUM) OVER (PARTITION BY A.EMPLID ORDER BY A.EFFDT DESC) AS ACCOUNT_NUM_prev

然后,您只需要在主查询中引用它即可完成

FROM CTE 
WHERE RN IN (1, 2) 
AND MAX_UPDATE_DATE >= GETDATE() - 8
AND BANK_CD <> BANK_CD_prev
AND ACCOUNT_NUM <> ACCOUNT_NUM_prev

如果这解决了您的问题,请回复。由于您没有提供示例进行测试,因此很难说。

编辑:我的意思是:将该解决方案与您的初始方法一起使用,而无需自我加入。 它不会帮助你解决问题。

相关内容

  • 没有找到相关文章

最新更新