如何获取时间序列中状态更改的日期



我有以下问题:我有一个机器生命周期事件的时间表:

DROP TABLE IF EXISTS #machineStatus
CREATE TABLE #machineStatus
(
machineID VARCHAR(255),
machineStatus VARCHAR(255),
statusDate DATETIME
)
INSERT INTO #machineStatus (machineId, statusDate, machineStatus)
VALUES
('01255999', '2019-11-01',  '1 - InStorage'),
('01255999', '2019-12-01',  '1 - InStorage'),
('01255999', '2020-01-01',  '1 - InStorage'),
('01255999', '2020-02-01',  '1 - InStorage'),
('01255999', '2020-03-01',  '1 - InStorage'),
('01255999', '2020-04-01',  '1 - InStorage'),
('01255999', '2020-05-01',  '1 - InStorage'),
('01255999', '2020-06-01',  '1 - InStorage'),
('01255999', '2020-07-01',  '1 - InStorage'),
('01255999', '2020-08-01',  '1 - InStorage'),
('01255999', '2020-09-01',  '1 - InStorage'),
('01255999', '2020-11-01',  '1 - InStorage'),
('01255999', '2020-12-01',  '1 - InStorage'),
('01255999', '2020-12-15',  '1 - InStorage'),
('01255999', '2021-01-01',  '2 - RentedOut'),
('01255999', '2021-03-01',  '1 - InStorage'),
('01255999', '2021-04-01',  '1 - InStorage'),
('01255999', '2021-04-02',  '2 - RentedOut'),
('01255999', '2021-04-05',  '3 - Service'),
('01255999', '2021-04-15',  '4 - Repairs'),
('01255999', '2021-04-20',  '2 - RentedOut'),
('01255999', '2021-05-27',  '5 - Sold')

我需要创建一个新的列,其中我必须显示状态更改的最后日期:

SELECT
s.*,
(SELECT MAX(ss.statusDate) 
FROM #machineStatus ss 
WHERE ss.machineId = s.machineId 
AND ss.machineStatus <> s.machineStatus 
AND ss.statusDate < s.statusDate) AS statusChangeDate
FROM #machineStatus s
ORDER BY s.statusDate

使用SQLFiddle 运行

输出

我似乎不太明白,但我的问题是,我不知道如何获得机器的第一个/最早状态的日期。statusChangeDate列中的所有NULL值都应该是2019-11-01,如下所示:

|machineID  | machineStatus |           statusDate |     statusChangeDate |
|-----------|-------------- |----------------------|----------------------|
|  01255999 | 1 - InStorage | 2019-11-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2019-12-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-01-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-02-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-03-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-04-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-05-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-06-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-07-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-08-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-09-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-11-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-12-01T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2020-12-15T00:00:00Z | 2019-11-01T00:00:00Z |
|  01255999 | 2 - RentedOut | 2021-01-01T00:00:00Z | 2020-12-15T00:00:00Z |
|  01255999 | 1 - InStorage | 2021-03-01T00:00:00Z | 2021-01-01T00:00:00Z |
|  01255999 | 1 - InStorage | 2021-04-01T00:00:00Z | 2021-01-01T00:00:00Z |
|  01255999 | 2 - RentedOut | 2021-04-02T00:00:00Z | 2021-04-01T00:00:00Z |
|  01255999 | 3 - Service   | 2021-04-05T00:00:00Z | 2021-04-02T00:00:00Z |
|  01255999 | 4 - Repairs   | 2021-04-15T00:00:00Z | 2021-04-05T00:00:00Z |
|  01255999 | 2 - RentedOut | 2021-04-20T00:00:00Z | 2021-04-15T00:00:00Z |
|  01255999 | 5 - Sold      | 2021-05-27T00:00:00Z | 2021-04-20T00:00:00Z |

感谢您的帮助。谢谢!!:(

分两步完成。

首先,使用LAG() OVER ()检查状态是否已更改,并记录状态更改的日期。

然后,使用MAX() OVER ()向前传播这些日期,以填充NULL(在状态未更改的行上(

WITH
check_for_changes AS
(
SELECT
*,
CASE WHEN LAG(machineStatus) OVER (PARTITION BY machineID ORDER BY statusDate) = machineStatus THEN NULL ELSE statusDate END  statusChangeDate
FROM
machineStatus
)
SELECT
*,
MAX(statusChangeDate) OVER (PARTITION BY machineID ORDER BY statusDate)   AS lastStatusChangeDate
FROM
check_for_changes
ORDER BY
statusDate

http://sqlfiddle.com/#!195.18e8/1

一种方法是计算组号,并从上一组中获取最后一个日期。

with g as(
select machineId, statusDate,  machineStatus, sum(flag) over(partition by machineId order by statusDate) grp
from (
select *, case lag(machineStatus, 1, machineStatus) over(partition by machineId order by statusDate) when machineStatus then 0 else 1 end flag
from #machineStatus) s
) 
select machineId, statusDate, machineStatus
, (select top(1) g2.statusDate 
from g g2 
where g2.machineId = g1.machineId and g2.grp < g1.grp 
order by g2.statusDate desc) lastChange
from g g1
order by statusDate

db<gt;小提琴

您使用COALESCE来摆脱NULL

SELECT  s.*,
COALESCE((
SELECT  MAX(ss.statusDate)
FROM    #machineStatus ss
WHERE ss.machineID = s.machineID
AND ss.machineStatus <> s.machineStatus
AND ss.statusDate < s.statusDate
),
(
SELECT  MIN(ss.statusDate)
FROM    #machineStatus ss
WHERE ss.machineID = s.machineID
AND ss.machineStatus = s.machineStatus
AND ss.statusDate <= s.statusDate
))  AS statusChangeDate
FROM    #machineStatus s
ORDER BY s.statusDate;

最新更新