我在一个表中有以下数据:
|event_id |starttime |person_id|attended|
|------------|-----------------|---------|--------|
| 11512997-1 | 01-SEP-16 08:00 | 10001 | N |
| 11512997-2 | 01-SEP-16 10:00 | 10001 | N |
| 11512997-3 | 01-SEP-16 12:00 | 10001 | N |
| 11512997-4 | 01-SEP-16 14:00 | 10001 | N |
| 11512997-5 | 01-SEP-16 16:00 | 10001 | N |
| 11512997-6 | 01-SEP-16 18:00 | 10001 | Y |
| 11512997-7 | 02-SEP-16 08:00 | 10001 | N |
| 11512997-1 | 01-SEP-16 08:00 | 10002 | N |
| 11512997-2 | 01-SEP-16 10:00 | 10002 | N |
| 11512997-3 | 01-SEP-16 12:00 | 10002 | N |
| 11512997-4 | 01-SEP-16 14:00 | 10002 | Y |
| 11512997-5 | 01-SEP-16 16:00 | 10002 | N |
| 11512997-6 | 01-SEP-16 18:00 | 10002 | Y |
| 11512997-7 | 02-SEP-16 08:00 | 10002 | Y |
我想生成以下结果,其中返回连续出现的最大次数,其中attended = 'N':
|person_id|consec_missed_max|
| 1001 | 5 |
| 1002 | 3 |
如何在Oracle(或ANSI) SQL中做到这一点?谢谢!
编辑:到目前为止,我已经试过了:
WITH t1 AS
(SELECT t.person_id,
row_number() over(PARTITION BY t.person_id ORDER BY t.starttime) AS idx
FROM the_table t
WHERE t.attended = 'N'),
t2 AS
(SELECT person_id, MAX(idx) max_idx FROM t1 GROUP BY person_id)
SELECT t1.person_id, COUNT(1) ct
FROM t1
JOIN t2
ON t1.person_id = t2.person_id
GROUP BY t1.person_id;
主要工作在因子子查询"prep"中。你们似乎对解析函数有点熟悉,但这还不够。该解决方案使用所谓的"tabibitosan"方法在一个或多个维度上创建具有相同特征的连续行组;在本例中,您希望将连续的N行与每个序列的不同组进行分组。这是通过两个不同的ROW_NUMBER()调用来完成的—一个仅由人员划分,另一个由人员和参加人员划分。如果需要的话,谷歌"tabibitosan"可以了解更多关于这个想法的信息。
with
inputs ( event_id, starttime, person_id, attended ) as (
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10001, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual
),
prep ( starttime, person_id, attended, gp ) as (
select starttime, person_id, attended,
row_number() over (partition by person_id order by starttime) -
row_number() over (partition by person_id, attended
order by starttime)
from inputs
),
counts ( person_id, consecutive_absences ) as (
select person_id, count(*)
from prep
where attended = 'N'
group by person_id, gp
)
select person_id, max(consecutive_absences) as max_consecutive_absences
from counts
group by person_id
order by person_id;
输出: PERSON_ID MAX_CONSECUTIVE_ABSENCES
---------- ---------------------------------------
10001 5
10002 3
如果您使用Oracle 12c
,则可以使用MATCH_RECOGNIZE
:
数据:
CREATE TABLE data AS
SELECT *
FROM (
with inputs ( event_id, starttime, person_id, attended ) as (
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10001, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual
)
SELECT * FROM inputs
);
和查询:
SELECT PERSON_ID, MAX(LEN) AS MAX_ABSENCES_IN_ROW
FROM data
MATCH_RECOGNIZE (
PARTITION BY PERSON_ID
ORDER BY STARTTIME
MEASURES FINAL COUNT(*) AS len
ALL ROWS PER MATCH
PATTERN(a b*)
DEFINE b AS attended = a.attended
)
WHERE attended = 'N'
GROUP BY PERSON_ID;
输出:
"PERSON_ID","MAX_ABSENCES_IN_ROW"
10001,5
10002,3
编辑:正如@mathguy指出的,它可以重写为:
SELECT PERSON_ID, MAX(LEN) AS MAX_ABSENCES_IN_ROW
FROM data
MATCH_RECOGNIZE (
PARTITION BY PERSON_ID
ORDER BY STARTTIME
MEASURES COUNT(*) AS len
PATTERN(a+)
DEFINE a AS attended = 'N'
)
GROUP BY PERSON_ID;
db<>小提琴演示