我正在尝试汇总一个员工表,其中存在多个记录,而员工在一个团队中。我试图按分组,最小/最大分区依据和领先/滞后团队名称,但每个结果都以一个代理结束,该代理从一个团队移动,然后在以后的某个日期回到原始团队组作为一次出现,即使我按日期排序。
示例数据库:
Employee Name | Employee ID | Team Leader | Location | Start Date | End Date
John Smith | 123123 | Team A | Site A | 01/JAN/19 | 02/JAN/19
John Smith | 123123 | Team A | Site A | 02/JAN/19 | 03/JAN/19
John Smith | 123123 | Team B | Site A | 03/JAN/19 | 04/JAN/19
John Smith | 123123 | Team A | Site A | 04/JAN/19 | 05/JAN/19
John Smith | 123123 | Team B | Site A | 05/JAN/19 | 06/JAN/19
当我运行示例查询时:
SELECT
Employee Name
,Employee ID
,Team Leader
,Location
,MIN(Start Date) OVER(PARTITION BY Team Leader ORDER BY Employee ID, Start Date) AS Starting Date
,MAX(End Date) OVER(PARTITION BY Team Leader ORDER BY Employee ID, End Date) AS End Date
FROM TABLE 1
结果如下:
Employee Name | Employee ID | Team Leader | Location | Start Date | End Date
John Smith | 123123 | Team A | Site A | 01/JAN/19 | 05/JAN/19
John Smith | 123123 | Team B | Site A | 03/JAN/19 | 06/JAN/19
任何人都可以帮助达到预期的结果:
Employee Name | Employee ID | Team Leader | Location | Start Date | End Date
John Smith | 123123 | Team A | Site A | 01/JAN/19 | 03/JAN/19
John Smith | 123123 | Team B | Site A | 03/JAN/19 | 04/JAN/19
John Smith | 123123 | Team A | Site A | 04/JAN/19 | 05/JAN/19
John Smith | 123123 | Team B | Site A | 05/JAN/19 | 06/JAN/19
这里有一个选项:
-
test
CTE 表示您的数据(简化一点( - 有用的代码是从 #8 行开始 的
SQL> with test (ename, team, start_date, end_date) as
2 (select 'John', 'A', date '2019-01-01', date '2019-01-02' from dual union all
3 select 'John', 'A', date '2019-01-02', date '2019-01-03' from dual union all
4 select 'John', 'B', date '2019-01-03', date '2019-01-04' from dual union all
5 select 'John', 'A', date '2019-01-04', date '2019-01-05' from dual union all
6 select 'John', 'B', date '2019-01-05', date '2019-01-06' from dual
7 ),
8 temp as
9 (select ename, team, start_date, end_date,
10 row_number() over (order by start_date) rn,
11 row_number() over (partition by ename, team order by start_date) rna
12 from test
13 )
14 select ename, team, min(start_date) start_date, max(end_date) end_date
15 from temp
16 group by ename, team, (rn - rna)
17 order by 3;
ENAM T START_DATE END_DATE
---- - ----------- -----------
John A 01/jan/2019 03/jan/2019
John B 03/jan/2019 04/jan/2019
John A 04/jan/2019 05/jan/2019
John B 05/jan/2019 06/jan/2019
SQL>
如果您使用的是版本 12c 或更高版本,则行模式匹配是一个很好的替代解决方案。与"差距和孤岛"解决方案不同,我也处理重叠。WITH 子句包含测试数据,解决方案随后开始。
with test (ename, team, start_date, end_date) as
(select 'John', 'A', date '2019-01-01', date '2019-01-02' from dual union all
select 'John', 'A', date '2019-01-02', date '2019-01-03' from dual union all
select 'John', 'B', date '2019-01-03', date '2019-01-04' from dual union all
select 'John', 'A', date '2019-01-04', date '2019-01-05' from dual union all
select 'John', 'B', date '2019-01-05', date '2019-01-06' from dual
)
select * from test
match_recognize(
partition by ename, team order by start_date
measures first(start_date) start_date, last(end_date) end_date
pattern(a b*)
define b as start_date <= a.end_date
)
order by ename, start_date;
ENAM T START_DATE END_DATE
---- - ---------------- ----------------
John A 2019-01-01 00:00 2019-01-03 00:00
John B 2019-01-03 00:00 2019-01-04 00:00
John A 2019-01-04 00:00 2019-01-05 00:00
John B 2019-01-05 00:00 2019-01-06 00:00
这看起来像是间隙和孤岛的一种形式,其中记录按日期范围链接。
下面是一种方法,它使用left join
来查找孤岛的开始位置,然后是用于标识组和聚合的累积总和:
select employeename, employeeid, teamleader, location,
min(startdate), max(enddate)
from (select t1.*,
sum(case when tprev.employeeid is null -- new group
then 1 else 0
end) over (partition by employeeid, teamleader, location
order by startdate
) as grouping
from table1 t1 left join
table1 tprev
on t1.startdate = tprev.enddate and
t1.employeeid = tprev.employeeid and
t1.teamleader = tprev.teamleader and
t1.location = tprev.location
) t
group by employeeid, teamleader, location, grouping
order by employeeid, min(startdate);