我有一个查询,它返回以下内容,除了最后一列,这是我需要弄清楚如何创建的。对于每个给定的ObservationID
,我需要返回状态变化的日期;类似于LEAD()函数的东西,它将接受条件而不仅仅是偏移量。这能做到吗?
我需要计算列Change Date;它应该是状态不是当前状态的最后日期。
+---------------+--------+-----------+--------+-------------+
| ObservationID | Region | Date | Status | Change Date | <-This field
+---------------+--------+-----------+--------+-------------+
| 1 | 10 | 1/3/2012 | Ice | 1/4/2012 |
| 2 | 10 | 1/4/2012 | Water | 1/6/2012 |
| 3 | 10 | 1/5/2012 | Water | 1/6/2012 |
| 4 | 10 | 1/6/2012 | Gas | 1/7/2012 |
| 5 | 10 | 1/7/2012 | Ice | |
| 6 | 20 | 2/6/2012 | Water | 2/10/2012 |
| 7 | 20 | 2/7/2012 | Water | 2/10/2012 |
| 8 | 20 | 2/8/2012 | Water | 2/10/2012 |
| 9 | 20 | 2/9/2012 | Water | 2/10/2012 |
| 10 | 20 | 2/10/2012 | Ice | |
+---------------+--------+-----------+--------+-------------+
一个模型子句(10g+)可以以一种紧凑的方式做到这一点:
SQL> create table observation(ObservationID , Region ,obs_date, Status)
2 as
3 select 1, 10, date '2012-03-01', 'Ice' from dual union all
4 select 2, 10, date '2012-04-01', 'Water' from dual union all
5 select 3, 10, date '2012-05-01', 'Water' from dual union all
6 select 4, 10, date '2012-06-01', 'Gas' from dual union all
7 select 5, 10, date '2012-07-01', 'Ice' from dual union all
8 select 6, 20, date '2012-06-02', 'Water' from dual union all
9 select 7, 20, date '2012-07-02', 'Water' from dual union all
10 select 8, 20, date '2012-08-02', 'Water' from dual union all
11 select 9, 20, date '2012-09-02', 'Water' from dual union all
12 select 10, 20, date '2012-10-02', 'Ice' from dual ;
Table created.
SQL> select ObservationID, obs_date, Status, status_change
2 from observation
3 model
4 dimension by (Region, obs_date, Status)
5 measures ( ObservationID, obs_date obs_date2, cast(null as date) status_change)
6 rules (
7 status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]
8 )
9 order by 1;
OBSERVATIONID OBS_DATE STATU STATUS_CH
------------- --------- ----- ---------
1 01-MAR-12 Ice 01-APR-12
2 01-APR-12 Water 01-JUN-12
3 01-MAY-12 Water 01-JUN-12
4 01-JUN-12 Gas 01-JUL-12
5 01-JUL-12 Ice
6 02-JUN-12 Water 02-OCT-12
7 02-JUL-12 Water 02-OCT-12
8 02-AUG-12 Water 02-OCT-12
9 02-SEP-12 Water 02-OCT-12
10 02-OCT-12 Ice
小提琴:http://sqlfiddle.com/# !4/f6687/1
。我们将对区域、日期和状态进行维度划分,因为我们希望查看具有相同区域的单元格,但要获得状态不同的第一个日期。
我们也必须测量日期,所以我创建了一个别名obs_date2
来做这件事,我们想要一个新的列status_change
来保存状态改变的日期。
这行是为我们做所有工作的行:
status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]
它说,对于我们的三个维度,只查看具有相同区域(cv(Region),
)的行,并查看日期跟随当前行(obs_date > cv(obs_date)
)日期的行,并且状态与当前行(status != cv(status)
)不同,最终获得满足这组条件的最小日期(min(obs_date2)
)并将其分配给status_change
。左边的any,any,any
部分表示此计算适用于所有行。
我已经尝试了很多次来理解MODEL子句,但从来没有真正管理好它,所以我认为我应该添加另一个解决方案
该解决方案采用了Ronnis所做的一些工作,但使用了LEAD函数的IGNORE NULLS
子句。我认为这只是Oracle 11的新功能,但如果有必要,您可能可以将其替换为Oracle 10的FIRST_VALUE
函数。
select
observation_id,
region,
observation_date,
status,
lead(case when is_change = 'Y' then observation_date end) ignore nulls
over (partition by region order by observation_date) as change_observation_date
from (
select
a.observation_id,
a.region,
a.observation_date,
a.status,
case
when status = lag(status) over (partition by region order by observation_date)
then null
else 'Y' end as is_change
from observations a
)
order by 1
在清理重叠的from/to-dates和重复的行时,我经常这样做。您的情况要简单得多,因为您只有"from-date":)
设置测试数据
create table observations(
observation_id number not null
,region number not null
,observation_date date not null
,status varchar2(10) not null
);
insert
into observations(observation_id, region, observation_date, status)
select 1, 10, date '2012-03-01', 'Ice' from dual union all
select 2, 10, date '2012-04-01', 'Water' from dual union all
select 3, 10, date '2012-05-01', 'Water' from dual union all
select 4, 10, date '2012-06-01', 'Gas' from dual union all
select 5, 10, date '2012-07-01', 'Ice' from dual union all
select 6, 20, date '2012-06-02', 'Water' from dual union all
select 7, 20, date '2012-07-02', 'Water' from dual union all
select 8, 20, date '2012-08-02', 'Water' from dual union all
select 9, 20, date '2012-09-02', 'Water' from dual union all
select 10, 20, date '2012-10-02', 'Ice' from dual;
commit;
下面的查询有三个兴趣点:
- 识别重复信息(记录显示与前一次记录相同)
- 忽略重复录音
- 确定"下一次"变更的日期
.
with lagged as(
select a.*
,case when status = lag(status, 1) over(partition by region
order by observation_date)
then null
else rownum
end as change_flag -- 1
from observations a
)
select observation_id
,region
,observation_date
,status
,lead(observation_date, 1) over(
partition by region
order by observation_date
) as change_date --3
,lead(observation_date, 1, sysdate) over(
partition by region
order by observation_date
) - observation_date as duration
from lagged
where change_flag is not null -- 2
;