这个问题与我最近问的另一个问题几乎相同,有一个非常重要的区别,即这些事务是贷款事务,因此,项目可能在数据中多次出现。这就是我现在使用LEAD
的原因。有了这个澄清,我把我的问题转发到下面。
我有一个Oracle数据库中的事务表。我正试图为涉及多个交易类型的交付系统拉一个报告。"请求"类型实际上可以是四个子类型之一(本例中为"A"、"B"、"C"one_answers"D"),"交付"类型可以是四个子类型之一("PULL"、"PICKUP"、"MAIL")。从"请求"到"交付"可以有1到5个事务,并且许多"交付"类型也是中间事务。例子:
Item | Transaction | Timestamp
001 | REQ-A | 2014-07-31T09:51:32Z
002 | REQ-B | 2014-07-31T09:55:53Z
003 | REQ-C | 2014-07-31T10:01:15Z
004 | REQ-D | 2014-07-31T10:02:29Z
005 | REQ-A | 2014-07-31T10:05:47Z
002 | PULL | 2014-07-31T10:20:04Z
002 | MAIL | 2014-07-31T10:20:06Z
001 | PULL | 2014-07-31T10:22:21Z
001 | TRANSFER | 2014-07-31T10:22:23Z
003 | PULL | 2014-07-31T10:24:10Z
003 | TRANSFER | 2014-07-31T10:24:12Z
004 | PULL | 2014-07-31T10:26:28Z
005 | PULL | 2014-07-31T10:28:42Z
005 | TRANSFER | 2014-07-31T10:28:44Z
001 | ARRIVE | 2014-07-31T11:45:01Z
001 | PICKUP | 2014-07-31T11:45:02Z
003 | ARRIVE | 2014-07-31T11:47:44Z
003 | PICKUP | 2014-07-31T11:47:45Z
005 | ARRIVE | 2014-07-31T11:49:45Z
005 | PICKUP | 2014-07-31T11:49:46Z
我需要的是这样一份报告:
Item | Start Tx | End Tx | Time
001 | REQ-A | PICKUP | 1:53:30
002 | REQ-B | MAIL | 0:24:13
003 | REQ-C | PICKUP | 1:46:30
004 | REQ-D | PULL | 0:23:59
005 | REQ-A | PICKUP | 1:43:59
What I have:
Item | Start Tx | End Tx | Time
001 | REQ-A | PULL | 0:30:49
001 | REQ-A | TRANSFER | 0:30:51
001 | REQ-A | ARRIVE | 1:53:29
001 | REQ-A | PICKUP | 1:53:30
002 | REQ-B | PULL | 0:24:11
002 | REQ-B | MAIL | 0:24:13
003 | REQ-C | PULL | 0:22:55
003 | REQ-C | TRANSFER | 0:22:57
003 | REQ-C | ARRIVE | 1:46:29
003 | REQ-C | PICKUP | 1:46:30
004 | REQ-D | PULL | 0:23:59
005 | REQ-A | PULL | 0:22:55
005 | REQ-A | TRANSFER | 0:22:57
005 | REQ-A | ARRIVE | 1:43:58
005 | REQ-A | PICKUP | 1:43:59
我正在做什么来获取数据:
SELECT Item, Transaction, nextTransaction, nextTimestamp - Timestamp
FROM (
SELECT Item, Transaction, Timestamp,
LEAD(Transaction, 5) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
LEAD(Timestamp, 5) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
FROM Transactions
UNION ALL
SELECT Item, Transaction, Timestamp,
LEAD(Transaction, 4) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
LEAD(Timestamp, 4) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
FROM Transactions
UNION ALL
SELECT Item, Transaction, Timestamp,
LEAD(Transaction, 3) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
LEAD(Timestamp, 3) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
FROM Transactions
UNION ALL
SELECT Item, Transaction, Timestamp,
LEAD(Transaction, 2) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
LEAD(Timestamp, 2) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
FROM Transactions
UNION ALL
SELECT Item, Transaction, Timestamp,
LEAD(Transaction, 1) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
LEAD(Timestamp, 1) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
FROM Transactions
)
WHERE nextTransaction IS NOT NULL
AND Transaction IN ('REQ-A', 'REQ-B', 'REQ-C', 'REQ-D')
我可以在脚本中手动解析它(也许这实际上是最好的操作过程),但为了学习,我想知道是否有可能单独使用SQL来执行此操作。
为了澄清"贷款"部分,该表中还有其他交易用于回报和其他形式的处理,这些交易与本报告无关,而不是作为其他交易类型存在。一旦项被返回,它就可以再次经历请求周期。例如,对于物品001
,它可以遵循物品002
的循环(REQ -> MAIL),然后它可以获得"未在货架上"事务,或非请求贷款,或其他一些用例。然后它可以返回通过REQ->拾取循环,或REQ->拉循环。
这是一个缺口和孤岛问题,但是由REQ
事务定义的孤岛使它比某些问题更复杂。
你可以使用嵌套的前置和滞后函数和一些操作来得到你需要的:
select distinct item,
coalesce(start_tran,
lag(start_tran) over (partition by item order by timestamp)) as start_tran,
coalesce(end_tran,
lead(end_tran) over (partition by item order by timestamp)) as end_tran,
coalesce(end_time,
lead(end_time) over (partition by item order by timestamp))
- coalesce(start_time,
lag(start_time) over (partition by item order by timestamp)) as time
from (
select item, timestamp, start_tran, start_time, end_tran, end_time
from (
select item,
timestamp,
case when lag_tran is null or transaction like 'REQ%'
then transaction end as start_tran,
case when lag_tran is null or transaction like 'REQ%'
then timestamp end as start_time,
case when lead_tran is null or lead_tran like 'REQ%'
then transaction end as end_tran,
case when lead_tran is null or lead_tran like 'REQ%'
then timestamp end as end_time
from (
select item, transaction, timestamp,
lag(transaction)
over (partition by item order by timestamp) as lag_tran,
lead(transaction)
over (partition by item order by timestamp) as lead_tran
from transactions
)
)
where start_tran is not null or end_tran is not null
)
order by item, start_tran;
对于项1和项2的第二个循环,可以给出:
ITEM START_TRAN END_TRAN TIME
---------- ---------- ---------- -----------
1 REQ-A PICKUP 0 1:53:30.0
1 REQ-E PICKUP 0 1:23:30.0
2 REQ-B MAIL 0 0:24:13.0
2 REQ-F REQ-F 0 0:0:0.0
3 REQ-C PICKUP 0 1:46:30.0
4 REQ-D PULL 0 0:23:59.0
5 REQ-A PICKUP 0 1:43:59.0
SQL Fiddle显示所有中间步骤。
这并不像乍一看那么可怕。最内层的查询获取原始数据,并为领先和滞后事务添加额外的列。只取第一组item-1记录,将是:
ITEM TRANSACTION TIMESTAMP LAG_TRAN LEAD_TRAN
---------- ----------- ------------------------ ---------- ----------
1 REQ-A 2014-07-31T09:51:32Z PULL
1 PULL 2014-07-31T10:22:21Z REQ-A TRANSFER
1 TRANSFER 2014-07-31T10:22:23Z PULL ARRIVE
1 ARRIVE 2014-07-31T11:45:01Z TRANSFER PICKUP
1 PICKUP 2014-07-31T11:45:02Z ARRIVE REQ-E
注意REQ-E
弹出作为最后一个lead_tran
?这是该项的第二个记录周期的第一个transaction
,以后会很有用。下一级查询使用这些前导和滞后值,并将REQ
值作为开始和结束标记,并使用该信息为每个循环清除除第一个和最后一个记录外的所有内容。
ITEM TIMESTAMP START_TRAN START_TIME END_TRAN END_TIME
---------- ------------------------ ---------- ------------------------ ---------- ------------------------
1 2014-07-31T09:51:32Z REQ-A 2014-07-31T09:51:32Z
1 2014-07-31T10:22:21Z
1 2014-07-31T10:22:23Z
1 2014-07-31T11:45:01Z
1 2014-07-31T11:45:02Z PICKUP 2014-07-31T11:45:02Z
下一层的查询删除了所有不代表开始或结束(或两者都是-参见REQ-F
)的行,因为我们对它们不感兴趣:
ITEM TIMESTAMP START_TRAN START_TIME END_TRAN END_TIME
---------- ------------------------ ---------- ------------------------ ---------- ------------------------
1 2014-07-31T09:51:32Z REQ-A 2014-07-31T09:51:32Z
1 2014-07-31T11:45:02Z PICKUP 2014-07-31T11:45:02Z
现在每个循环都有成对的行(或者REQ-F
的单行)。最后一个关卡再次使用铅和滞后来填补空白;如果start_tran
是空的,那么这是一个结束行,我们应该使用前一行的开始数据;如果end_tran
是空的,那么这是一个开始行,我们应该使用下一行的结束数据。
ITEM START_TRAN START_TIME END_TRAN END_TIME TIME
1 REQ-A 2014-07-31T09:51:32Z PICKUP 2014-07-31T11:45:02Z 0 1:53:30.0
1 REQ-A 2014-07-31T09:51:32Z PICKUP 2014-07-31T11:45:02Z 0 1:53:30.0
这使得两行相同,因此distinct
删除了重复项。
这应该会给出相同的结果。我正在复制戈登的答案,它仍然有效
select item,
min(transaction) keep (dense_rank first order by timestamp) as StartTx,
min(transaction) keep (dense_rank last order by timestamp) as EndTx,
max(timestamp) - min(timestamp)
from transactions t
group by item;
即使在txn中有重复项,它也将由分析函数处理。