示例数据集,其中每一行记录独特的人/主要组合的活动时间
注意:此示例数据集
中显示了属于以下每个人的所有记录Note2:终止日期为9999/12/31仅意味着该人目前已入学。
TBL_MAJORS
Person_ID Major Effective_Date Termination_Date
76 Biology 2016/8/1 2017/2/24
76 Accounting 2017/2/25 2017/4/15
76 Math 2017/4/16 9999/12/31
102 Physics 2016/8/1 2018/7/21
58 Literature 2016/8/1 2017/5/13
94 Art History 2016/8/1 2017/8/30
94 Comp Sci 2018/3/1 9999/12/31
最终目标数据集带有新列,指示每个人是否在2017年全年不断注册专业
TBL_MAJORS_2
Person_ID Major Effective_Date Termination_Date Active_Throughout_2017
76 Biology 2016/8/1 2017/2/24 1
76 Accounting 2017/2/25 2017/4/15 1
76 Math 2017/4/16 9999/12/31 1
102 Physics 2016/8/1 2018/7/21 1
58 Literature 2016/8/1 2017/5/13 0
94 Art History 2016/8/1 2017/8/30 0
94 Comp Sci 2018/3/1 9999/12/31 0
更多说明
因此,例如,第76人的大满贯有三个不同的专业,从2016/8/1到9999/12/31(即他们目前正在注册(,因此他们在整个2017年都被录入了。
。另一方面,第94个人有两个不同的专业,但是从2017/8/31到2018/2/28的活动存在差距,因此他们在2017年没有连续入学。
。谁能帮助我实现这一目标?奖励很荣幸,如果您可以使其表现良好...我认为这可能会涉及某种窗口/分析功能,但老实说我很难过。
这可以像这样做:
WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
major,
effective_date,
termination_date,
CASE WHEN to_date('01/01/2017', 'dd/mm/yyyy') >= MIN(effective_date) OVER (PARTITION BY person_id, grp)
AND to_date('31/12/2017', 'dd/mm/yyyy') <= MAX(termination_date) OVER (PARTITION BY person_id, grp)
THEN 1
ELSE 0
END active_throughout_2017
FROM (SELECT person_id,
major,
effective_date,
termination_date,
SUM(new_grp_start) OVER (PARTITION BY person_id ORDER BY effective_date) grp
FROM (SELECT person_id,
major,
effective_date,
termination_date,
CASE WHEN effective_date = LAG(termination_date) OVER (PARTITION BY person_id ORDER BY effective_date) + 1 THEN 0 ELSE 1 END new_grp_start
FROM tbl_majors));
PERSON_ID MAJOR EFFECTIVE_DATE TERMINATION_DATE ACTIVE_THROUGHOUT_2017
---------- ----------- -------------- ---------------- ----------------------
58 Literature 01/08/2016 13/05/2017 0
76 Accounting 25/02/2017 15/04/2017 1
76 Biology 01/08/2016 24/02/2017 1
76 Math 16/04/2017 31/12/9999 1
94 Art History 01/08/2016 30/08/2017 0
94 Comp Sci 01/03/2018 31/12/9999 0
102 Physics 01/08/2016 21/07/2018 1
这首先锻炼当前行是否是连续的 - 即当前行的有效_date是否比上一行的termination_date。
一旦我们知道上一行是否是连续的,我们就会知道这是一个新组的开始。然后,我们可以对此进行累积总和,这意味着所有连续的行都将获得相同的数字,并且每次启动新的行时数字都会增加。
那么,为每个人和小组制定最小效力和最大termination_date是一个简单的问题,然后与2017年的第一次和最后几天进行重叠的比较。
eta:这是基于Gsazheniuk的建议的解决方案:
WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
major,
effective_date,
termination_date,
CASE WHEN SUM(LEAST(to_date('31/12/2017', 'dd/mm/yyyy'), termination_date) + 1 - GREATEST(to_date('01/01/2017', 'dd/mm/yyyy'), effective_date)) OVER (PARTITION BY person_id)
= to_date('31/12/2017', 'dd/mm/yyyy') + 1 - to_date('01/01/2017', 'dd/mm/yyyy')
THEN 1
ELSE 0
END active_throughout_2017
FROM tbl_majors;
您可以看到,它比我的原始解决方案更优雅。您应该测试两者,以查看哪个是数据的性能,等等。
两种解决方案都假设不会有重叠(即,一个学生在任何时候只有一个专业(;您必须相应地调整它们以适应某人一次可能拥有多个专业的情况。