我有这样的表格
|---------------------|------------------|---------------------|------------------|
| FILE | UNIT | DATE | ID SEQUENCE |
|---------------------|------------------|---------------------|------------------|
| 10 | 34 | 01/02/2000 | 10 |
|---------------------|------------------|---------------------|------------------|
| 10 | 34 | 01/05/2000 | 11 |
|---------------------|------------------|---------------------|------------------|
| 10 | 40 | 01/05/2000 | 12 |
|---------------------|------------------|---------------------|------------------|
| 10 | 40 | 01/02/2000 | 13 |
|---------------------|------------------|---------------------|------------------|
| 10 | 40 | 01/02/2000 | 14 |
|---------------------|------------------|---------------------|------------------|
| 10 | 40 | 01/15/2000 | 15 |
|---------------------|------------------|---------------------|------------------|
| 10 | 34 | 01/16/2000 | 16 |
|---------------------|------------------|---------------------|------------------|
| 10 | 70 | 01/17/2000 | 17 |
|---------------------|------------------|---------------------|------------------|
| 10 | 70 | 01/28/2000 | 18 |
|---------------------|------------------|---------------------|------------------|
我需要构建这样的视图(获取文件在每个单元中停留的天数(
|---------------------|------------------|---------------------|
| FILE | UNIT | DAYS IN UNITY |
|---------------------|------------------|---------------------|
| 10 | 34 | 3 |
|---------------------|------------------|---------------------|
| 10 | 40 | 10 |
|---------------------|------------------|---------------------|
| 10 | 34 | 1 |
|---------------------|------------------|---------------------|
| 10 | 70 | 11 |
|---------------------|------------------|---------------------|
任何建议
提前致谢
这是间隙和孤岛的一种形式。 为此,我正在考虑行号的差异:
select file, unit,
(lead(min(date), 1, max(date)) over (partition by file, unit) -
min(date)
) as days_in_unity
from (select t.*,
row_number() over (partition by file order by id_sequence) as seqnum,
row_number() over (partition by file, unit order by id_sequence) as seqnum_2
from t
) t
group by file, unit, (seqnum - seqnum_2)
为什么这有效有点难以解释。 如果您查看子查询的结果,您将看到对于具有相同unit
值的行,两个行号的差异是如何恒定的。
如果您使用的是较新版本的 Oracle(12c 及更高版本(,则使用 MATCH_RECOGNIZE(模式匹配(可能是一个想法。 在下面的查询中,我们定义了 2 种模式:
{1}多天,包括开始日期和结束日期。 此模式可能包含"不确定"日期,例如 01/02/2000 在 01/05/2000 和 01/15/2000 之间的单元 40。
{2}一天:当"开始"和"结束"在同一天时,就会发生这种情况。 在 MEASURES 子句中,我们选取我们需要的所有列,并将它们合并到 SELECT 子句中(此处MATCH_RECOGNIZE文档(。
桌子
create table fileandunit( FILE_, UNIT, DATE_, ID_SEQUENCE )
as
select 10, 34, to_date( '02-JAN-2000', 'DD-MON-YYYY'), 10 from dual union all
select 10, 34, to_date( '05-JAN-2000', 'DD-MON-YYYY'), 11 from dual union all
select 10, 40, to_date( '05-JAN-2000', 'DD-MON-YYYY'), 12 from dual union all
select 10, 40, to_date( '02-JAN-2000', 'DD-MON-YYYY'), 13 from dual union all
select 10, 40, to_date( '02-JAN-2000', 'DD-MON-YYYY'), 14 from dual union all
select 10, 40, to_date( '15-JAN-2000', 'DD-MON-YYYY'), 15 from dual union all
select 10, 34, to_date( '16-JAN-2000', 'DD-MON-YYYY'), 16 from dual union all
select 10, 70, to_date( '17-JAN-2000', 'DD-MON-YYYY'), 17 from dual union all
select 10, 70, to_date( '28-JAN-2000', 'DD-MON-YYYY'), 18 from dual ;
数据和模式
select * from fileandunit order by id_sequence ;
FILE_ UNIT DATE_ ID_SEQUENCE
10 34 02-JAN-00 10 -- start
10 34 05-JAN-00 11 -- end
10 40 05-JAN-00 12 -- start
10 40 02-JAN-00 13 -- iffy
10 40 02-JAN-00 14 -- iffy
10 40 15-JAN-00 15 -- end
10 34 16-JAN-00 16 -- single day
10 70 17-JAN-00 17 -- start
10 70 28-JAN-00 18 -- end
查询
select
coalesce( RP.m_file, RP.s_file ) file_
, coalesce( RP.m_unit, RP.s_unit ) unit_
, coalesce( ( RP.m_end - RP.m_start ), 1 ) days_
from fileandunit
match_recognize(
partition by file_ order by id_sequence
measures
enddt.file_ as m_file -- m_: for multiple days
, enddt.unit as m_unit
, startdt.date_ as m_start
, enddt.date_ as m_end
, singledt.file_ as s_file -- s_: single day
, singledt.unit as s_unit
, singledt.date_ as s_date
one row per match
pattern ( ( startdt iffydt* enddt ) | singledt ) -- multiple days (or) single day
define
startdt as ( prev( date_ ) <= date_ or prev( date_ ) is null )
and ( prev( unit ) <> unit or prev( unit ) is null )
--
, enddt as ( next( date_ ) >= date_ or next( date_ ) is null )
and ( next( unit ) <> unit or next( unit ) is null )
--
, iffydt as ( prev( date_ ) >= date_ ) -- detect incorrect dates inside a multiple day block
and ( prev( unit ) = unit )
--
, singledt as ( prev( date_ ) = date_ - 1 and next( date_ ) = date_ + 1 )
and ( prev( unit ) <> unit and next( unit ) <> unit )
) RP ;
结果
FILE_ UNIT_ DAYS_
10 34 3
10 40 10
10 34 1
10 70 11
使用 Oracle 18c 进行测试。 在这里摆弄。