我需要将订单分组在一起,并且只跨越它们的日期范围
场景A.
- 订单1,1.3.22020-30.6.2020
- 订单2,1.5.2020-31.8.2020
- 订单3,2020年7月31日至2020年10月31日
- 订单4,2020年7月31日至2020年12月31日
所以输出应该是
- 订单1,订单2
- 订单2、订单3、订单4
order1,3,4未分组,因为它们的范围根本不交叉
场景B.
与上述相同,再加上另一个订单
- 订单5,2020年1月1日至2020年12月31日
因此输出将为
- 订单1、订单2、订单5
- 订单2、订单3、订单4、订单5
我尝试了Self Join来检查哪个开始日期在该范围内。因此在顺序1的范围内仅落在顺序2->我们有一组则在顺序2的范围内,顺序3和顺序4的开始日期都下降->我们有第二组但随后对于顺序3落在顺序4的开始日期并且相反->这将给出另外两个组,但它们是无效的,因为顺序2也跨越了它们的日期范围,也应该包括在内,因为将有3个双副本,我们应该像在所需输出中一样只显示一次,但这种方法将失败。
感谢
MATCH_RECOGNIZE解决方案的结果不正确,因为顺序5应该在两组中
我用一些分析函数来解决这个问题:
--创建表
Create table cross_dates (order_id number, start_date date , end_date date);
--插入日期
insert into cross_dates values( 1, to_date('01.03.2020', 'dd.mm.yyyy'), to_date('30.06.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 2, to_date('01.05.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 3, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 4, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.10.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 5, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'));
--SQL
select 'Order '|| min_order_id ||': ', listagg( order_id, ',') within group (order by order_id) list
from (
select distinct min_order_id, order_id from (
with dates (cur_date, end_date, order_id, start_date) as (
select start_date, end_date, order_id, start_date
from cross_Dates
union all
select cur_date + 1, end_date, order_id,start_date
from dates
where cur_date < end_date )
select d.order_id,
min(d.order_id) over(partition by greatest(d.start_date, cd.start_date)) min_order_id
from dates d, cross_Dates cd
where d.cur_date between cd.start_date and cd.end_date ))
group by min_order_id
having count(*) > 1;
结果:
Order 1: 1,2,5
Order 2: 2,3,4,5
-添加新列并更新旧记录
alter table cross_dates add (item varchar2(1));
update cross_dates set item = 'A';
--插入新记录B
insert into cross_dates values( 1, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '30.06.2020', 'dd.mm.yyyy'), 'B');
insert into cross_dates values( 1, to_date('01.07.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'), 'B');
我的假设:
- A和B是不同的顺序,即使在交叉时也不属于同一组
- 订单1B-有两条记录作为延续-在我看来,这就像一个订单:订单1B 01.01.2020-21.12.2020
如果我的假设是正确的,SQL可能看起来像这样:
select distinct min_order_id, order_id, item from (
with dates (cur_date, end_date, order_id, start_date, item) as (
select start_date, end_date, order_id, start_date, item
from cross_Dates
union all
select cur_date + 1, end_date, order_id,start_date, item
from dates
where cur_date < end_date )
select d.order_id, d.item,
min(d.order_id) over(partition by greatest(d.start_date, cd.start_date),d.item) min_order_id
from dates d, cross_Dates cd
where d.cur_date between cd.start_date and cd.end_date and d.item = cd.item )
order by item, min_order_id;
结果:
MIN_ORDER_ID ORDER_ID I
1 1 A
1 2 A
1 5 A
2 2 A
2 3 A
2 4 A
2 5 A
5 5 A
1 1 B
如果我的假设不正确,请提供我在这种情况下的结果。
:(
您可以使用MATCH_RECOGNIZE
查找下一个值的开始日期早于或等于组中所有以前值的结束日期的组。然后,您可以聚合和排除完全包含在另一个组中的组:
WITH groups ( id, ids, start_date, end_date ) AS (
SELECT id,
LISTAGG( grp_id, ',' ) WITHIN GROUP ( ORDER BY start_date ),
MIN( start_date ),
MIN( end_date )
FROM (
SELECT t.id,
x.id AS grp_id,
x.start_date,
x.end_date
FROM table_name t
INNER JOIN table_name x
ON (
x.start_date >= t.start_date
AND x.start_date <= t.end_date
)
)
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY start_date
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN ( FIRST_ROW GROUPED_ROWS* )
DEFINE GROUPED_ROWS AS (
GROUPED_ROWS.start_date <= MIN( end_date )
)
)
WHERE mno = 1
GROUP BY id
)
SELECT id,
ids
FROM groups g
WHERE NOT EXISTS (
SELECT 1
FROM groups x
WHERE g.ID <> x.ID
AND x.start_date <= g.start_date
AND g.end_date <= x.end_date
)
样本数据:
CREATE TABLE table_name ( id, start_date, end_date ) AS
SELECT 'order 1', DATE '2020-03-01', DATE '2020-06-30' FROM DUAL UNION ALL
SELECT 'order 2', DATE '2020-05-01', DATE '2020-08-31' FROM DUAL UNION ALL
SELECT 'order 3', DATE '2020-07-31', DATE '2020-10-31' FROM DUAL UNION ALL
SELECT 'order 4', DATE '2020-07-31', DATE '2020-12-31' FROM DUAL;
输出:
ID|IDS:------|:----------------------订单2|订单2、订单3、订单4订单1|订单1,订单2
我当时:
INSERT INTO table_name ( id, start_date, end_date )
VALUES ( 'order 5', DATE '2020-01-01', DATE '2020-12-31' );
输出为:
ID|IDS:------|:----------------------订单2|订单2、订单3、订单4订单5 |订单5,订单1,订单2
db<gt;小提琴这里