我有一个查询,用于返回每个日历年每个员工的样本值,还有一列用于检查(是/否(样本值是否为>=60000.
我的初始数据:
Employee_ID Calendar_Year Sample_Value Sample_Check
1234 2020 55,000 No
1234 2021 70,000 Yes
1234 2022 50,000 No
3456 2020 80,000 Yes
3456 2021 40,000 No
3456 2022 65,000 Yes
5678 2020 30,000 No
5678 2021 70,000 Yes
5678 2022 90,000 Yes
我想得到这个结果,因为这个员工是唯一一个有";是";连续2个日历年。
Employee_ID Calendar_Year Sample_Value Sample_Check
5678 2022 90,000 Yes
我查找过类似的问题,但找不到解决我问题的方法。我也研究过LAG和LEAD,但需要帮助了解他们是否能给我想要的结果。
我倾向于使用相关查询来查找符合条件的行,然后使用row_number窗口来选择所需的每组中最大/最小的行:
with v as (
select *,
case when exists (
select * from t t2
where t2.Employee_ID = t.Employee_ID
and t.Sample_Check = 'Yes'
and t2.Sample_Check = 'Yes'
and t2.Calendar_Year = t.Calendar_Year - 1
) then 1 else 0 end valid
from t
), s as (
select *,
Row_Number() over(partition by Employee_ID, valid order by Calendar_Year desc) rn
from v
)
select Employee_Id, Calendar_Year, Sample_Value, Sample_Check
from s
where valid = 1 and rn = 1;
我不确定这有多防弹。我在窗口分区中使用了滞后函数来获得先前的Sample_Check。然后,我在外部查询上进行匹配,以获得记录(基本上显示yes=yes(。如果你连续有3个(是(,那么它会后退2个。如果遇到这种情况,您可能可以使用一些条件逻辑来偏移行
SELECT
*
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE
) X
WHERE Sample_Check = LagSampleCheck1
ORDER BY Employee_ID ASC, Calendar_Year ASC
我还创建了另一个row_number((Over(Partition BY Employee ID and Order BY Calendar year(的记录,这样,如果你有多个符合该标准的记录,它会选择最近一年。我在你的原始数据集中添加了另一条记录(Employees ID 5678,Calendar year 2023,Samples Value和Sample Check Yes(,也创建了两条记录。
Employee_ID Calendar_Year Sample_Value Sample_Check
1234 2020 55,000 No
1234 2021 70,000 Yes
1234 2022 50,000 No
3456 2020 80,000 Yes
3456 2021 40,000 No
3456 2022 65,000 Yes
5678 2020 30,000 No
5678 2021 70,000 Yes
5678 2022 90,000 Yes
5678 2023 90,000 Yes
SELECT
*
FROM
(
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY CALENDAR_YEAR DESC) AS ROWCOUNTER
FROM
(
SELECT Employee_ID
,Calendar_Year
,Sample_Value
,Sample_Check
, LAG(Sample_Check) OVER (PARTITION BY Employee_ID ORDER BY Employee_ID ASC, Calendar_Year ASC) AS LagSampleCheck1
FROM EMPLOYEETABLE
) X
WHERE Sample_Check = LagSampleCheck1
) Z
WHERE ROWCOUNTER = 1
ORDER BY Employee_ID ASC, Calendar_Year ASC
这是最简单的解决方案。只需将表格本身加入即可(假设日历年为数字(
SELECT t1.*, t2.sample_check
FROM data AS t1, data AS t2
WHERE t1.emp_id = t2.emp_id
AND t1.calendar_year = t2.calendar_year + 1
AND t1.sample_check = t2.sample_check
AND t1.sample_check = 'Yes'
测试
也可以得到同样的结果,滞后函数;
WITH temp AS (SELECT emp_id
, calendar_year
, sample_value
, sample_check
, lag( CASE WHEN sample_check = 'Yes' THEN 1 ELSE 0 END, 1 )
OVER (PARTITION BY emp_id ORDER BY calendar_year) AS prevcheck
FROM data)
SELECT *
FROM temp
WHERE prevcheck = 1
AND sample_check = 'Yes'
两者给出相同的结果
emp_id calendar_year sample_value sample_check prevcheck
5678 2022 90 Yes 1
测试