如何在SQL中仅选择具有2个连续"Yes"值的行,按年份排序?



我有一个查询,用于返回每个日历年每个员工的样本值,还有一列用于检查(是/否(样本值是否为>=60000.

我的初始数据:

Employee_ID Calendar_Year Sample_Value  Sample_Check
1234        2020          55,000        No
1234        2021          70,000        Yes
1234        2022          50,000        No
3456        2020          80,000        Yes
3456        2021          40,000        No
3456        2022          65,000        Yes
5678        2020          30,000        No
5678        2021          70,000        Yes
5678        2022          90,000        Yes

我想得到这个结果,因为这个员工是唯一一个有";是";连续2个日历年。

Employee_ID Calendar_Year Sample_Value  Sample_Check
5678        2022          90,000        Yes

我查找过类似的问题,但找不到解决我问题的方法。我也研究过LAG和LEAD,但需要帮助了解他们是否能给我想要的结果。

我倾向于使用相关查询来查找符合条件的行,然后使用row_number窗口来选择所需的每组中最大/最小的行:

with v as (
select *, 
case when exists (
select * from t t2 
where t2.Employee_ID = t.Employee_ID 
and t.Sample_Check = 'Yes' 
and t2.Sample_Check = 'Yes' 
and t2.Calendar_Year = t.Calendar_Year - 1
) then 1 else 0 end valid
from t
), s as (
select *,
Row_Number() over(partition by Employee_ID, valid order by Calendar_Year desc) rn
from v
)
select Employee_Id, Calendar_Year, Sample_Value, Sample_Check
from s
where valid = 1 and rn = 1;

我不确定这有多防弹。我在窗口分区中使用了滞后函数来获得先前的Sample_Check。然后,我在外部查询上进行匹配,以获得记录(基本上显示yes=yes(。如果你连续有3个(是(,那么它会后退2个。如果遇到这种情况,您可能可以使用一些条件逻辑来偏移行

SELECT
*
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE

) X 
WHERE Sample_Check = LagSampleCheck1
ORDER BY Employee_ID ASC, Calendar_Year ASC 

我还创建了另一个row_number((Over(Partition BY Employee ID and Order BY Calendar year(的记录,这样,如果你有多个符合该标准的记录,它会选择最近一年。我在你的原始数据集中添加了另一条记录(Employees ID 5678,Calendar year 2023,Samples Value和Sample Check Yes(,也创建了两条记录。

Employee_ID Calendar_Year   Sample_Value    Sample_Check
1234        2020        55,000      No        
1234        2021        70,000      Yes       
1234        2022        50,000      No        
3456        2020        80,000      Yes       
3456        2021        40,000      No        
3456        2022        65,000      Yes       
5678        2020        30,000      No        
5678        2021        70,000      Yes       
5678        2022        90,000      Yes       
5678        2023        90,000      Yes

SELECT
*
FROM
(
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY CALENDAR_YEAR DESC) AS ROWCOUNTER
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE

) X 
WHERE Sample_Check = LagSampleCheck1
) Z
WHERE ROWCOUNTER = 1
ORDER BY Employee_ID ASC, Calendar_Year ASC

这是最简单的解决方案。只需将表格本身加入即可(假设日历年为数字(

SELECT t1.*, t2.sample_check
FROM data AS t1, data AS t2
WHERE t1.emp_id = t2.emp_id
AND t1.calendar_year = t2.calendar_year + 1
AND t1.sample_check = t2.sample_check
AND t1.sample_check = 'Yes'

测试

也可以得到同样的结果,滞后函数;

WITH temp AS (SELECT emp_id
, calendar_year
, sample_value
, sample_check
, lag( CASE WHEN sample_check = 'Yes' THEN 1 ELSE 0 END, 1 )
OVER (PARTITION BY emp_id ORDER BY calendar_year) AS prevcheck
FROM data)
SELECT *
FROM temp
WHERE prevcheck = 1
AND sample_check = 'Yes'

两者给出相同的结果

emp_id  calendar_year   sample_value    sample_check    prevcheck
5678      2022             90             Yes              1

测试

最新更新