如何在多个数据库记录中确定实体是否不断活跃



示例数据集,其中每一行记录独特的人/主要组合的活动时间

注意:此示例数据集

中显示了属于以下每个人的所有记录

Note2:终止日期为9999/12/31仅意味着该人目前已入学。

TBL_MAJORS
Person_ID   Major         Effective_Date   Termination_Date
76          Biology       2016/8/1         2017/2/24   
76          Accounting    2017/2/25        2017/4/15  
76          Math          2017/4/16        9999/12/31         
102         Physics       2016/8/1         2018/7/21
58          Literature    2016/8/1         2017/5/13
94          Art History   2016/8/1         2017/8/30   
94          Comp Sci      2018/3/1         9999/12/31

最终目标数据集带有新列,指示每个人是否在2017年全年不断注册专业

TBL_MAJORS_2
Person_ID   Major         Effective_Date   Termination_Date   Active_Throughout_2017
76          Biology       2016/8/1         2017/2/24          1
76          Accounting    2017/2/25        2017/4/15          1
76          Math          2017/4/16        9999/12/31         1
102         Physics       2016/8/1         2018/7/21          1
58          Literature    2016/8/1         2017/5/13          0
94          Art History   2016/8/1         2017/8/30          0
94          Comp Sci      2018/3/1         9999/12/31         0

更多说明

因此,例如,第76人的大满贯有三个不同的专业,从2016/8/1到9999/12/31(即他们目前正在注册(,因此他们在整个2017年都被录入了。

另一方面,第94个人有两个不同的专业,但是从2017/8/31到2018/2/28的活动存在差距,因此他们在2017年没有连续入学。

谁能帮助我实现这一目标?奖励很荣幸,如果您可以使其表现良好...我认为这可能会涉及某种窗口/分析功能,但老实说我很难过。

这可以像这样做:

WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
       major,
       effective_date,
       termination_date,
       CASE WHEN to_date('01/01/2017', 'dd/mm/yyyy') >= MIN(effective_date) OVER (PARTITION BY person_id, grp)
                 AND to_date('31/12/2017', 'dd/mm/yyyy') <= MAX(termination_date) OVER (PARTITION BY person_id, grp)
                 THEN 1
            ELSE 0
       END active_throughout_2017
FROM   (SELECT person_id,
               major,
               effective_date,
               termination_date,
               SUM(new_grp_start) OVER (PARTITION BY person_id ORDER BY effective_date) grp
        FROM   (SELECT person_id,
                       major,
                       effective_date,
                       termination_date,
                       CASE WHEN effective_date = LAG(termination_date) OVER (PARTITION BY person_id ORDER BY effective_date) + 1 THEN 0 ELSE 1 END new_grp_start
                FROM   tbl_majors));
 PERSON_ID MAJOR       EFFECTIVE_DATE TERMINATION_DATE ACTIVE_THROUGHOUT_2017
---------- ----------- -------------- ---------------- ----------------------
        58 Literature  01/08/2016     13/05/2017                            0
        76 Accounting  25/02/2017     15/04/2017                            1
        76 Biology     01/08/2016     24/02/2017                            1
        76 Math        16/04/2017     31/12/9999                            1
        94 Art History 01/08/2016     30/08/2017                            0
        94 Comp Sci    01/03/2018     31/12/9999                            0
       102 Physics     01/08/2016     21/07/2018                            1

这首先锻炼当前行是否是连续的 - 即当前行的有效_date是否比上一行的termination_date。

一旦我们知道上一行是否是连续的,我们就会知道这是一个新组的开始。然后,我们可以对此进行累积总和,这意味着所有连续的行都将获得相同的数字,并且每次启动新的行时数字都会增加。

那么,为每个人和小组制定最小效力和最大termination_date是一个简单的问题,然后与2017年的第一次和最后几天进行重叠的比较。


eta:这是基于Gsazheniuk的建议的解决方案:

WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
       major,
       effective_date,
       termination_date,
       CASE WHEN SUM(LEAST(to_date('31/12/2017', 'dd/mm/yyyy'), termination_date) + 1 - GREATEST(to_date('01/01/2017', 'dd/mm/yyyy'), effective_date)) OVER (PARTITION BY person_id)
                   = to_date('31/12/2017', 'dd/mm/yyyy') + 1 - to_date('01/01/2017', 'dd/mm/yyyy')
                 THEN 1
            ELSE 0
       END active_throughout_2017
FROM   tbl_majors;

您可以看到,它比我的原始解决方案更优雅。您应该测试两者,以查看哪个是数据的性能,等等。

两种解决方案都假设不会有重叠(即,一个学生在任何时候只有一个专业(;您必须相应地调整它们以适应某人一次可能拥有多个专业的情况。

最新更新