具有间隙/重叠的 SQL 计数天数



我正在研究一个几乎与此相同的"计数天数"问题。我有一个日期列表,需要计算使用了多少天,不包括重复项和处理差距。相同的输入和输出。

寄件人:马库斯·贾德罗

Input
ID   d1           d2
 1   2011-08-01   2011-08-08
 1   2011-08-02   2011-08-06
 1   2011-08-03   2011-08-10
 1   2011-08-12   2011-08-14
 2   2011-08-01   2011-08-03
 2   2011-08-02   2011-08-06
 2   2011-08-05   2011-08-09
Output
ID   hold_days
 1          11
 2           8

用于查找多个重叠间隔所经过时间的 SQL

但对于我的生活,我无法理解Markus Jarderot的解决方案。

SELECT DISTINCT
    t1.ID,
    t1.d1 AS date,
    -DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2                   -- Join for any events occurring while this
    ON t2.ID = t1.ID                  -- is starting. If this is a start point,
    AND t2.d1 <> t1.d1                -- it won't match anything, which is what
    AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0

为什么DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1)从整个列表中的min(d1)中挑选?是不是不管身份证。

t1.d1 BETWEEN t2.d1 AND t2.d2 do是什么?这是为了确保只计算重叠间隔吗?

我认为分组依据也是如此,因为如果在同一情况下相同的时期将被丢弃?我试图手动追踪解决方案,但越来越困惑。

这主要是我在这里的答案(包括解释(的副本,但在id列中包含分组。它应使用单个表扫描,并且不需要递归子查询分解子句 (CTE( 或自连接。

SQL 小提琴

Oracle 11g R2 架构设置

CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
  SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
  SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
  SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
  SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
  SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
  SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
  SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
  SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
  SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
  SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
  SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
  SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
  SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
  SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
  SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;

查询 1

SELECT id,
       SUM( days ) AS total_days
FROM   (
  SELECT id,
         dt - LAG( dt ) OVER ( PARTITION BY id
                               ORDER BY dt ) + 1 AS days,
         start_end
  FROM   (
    SELECT id,
           dt,
           CASE SUM( value ) OVER ( PARTITION BY id
                                    ORDER BY dt ASC, value DESC, ROWNUM ) * value
             WHEN 1 THEN 'start'
             WHEN 0 THEN 'end'
           END AS start_end
    FROM   your_table
    UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
  )
  WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id

结果

| ID | TOTAL_DAYS |
|----|------------|
|  1 |         25 |
|  2 |         13 |
|  3 |          9 |

暴力破解方法是创建所有天(在递归查询中(,然后计数:

with dates(id, day, d2) as
(
  select id, d1 as day, d2 from mytable
  union all
  select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;

不幸的是,某些 Oracle 版本中存在一个错误,带有日期的递归查询在那里不起作用。因此,请尝试此代码,看看它是否适用于您的系统。(我有Oracle 11.2,但错误仍然存在;所以我想你需要Oracle 12c。

我想 Markus 的想法是找到所有不在其他范围内的起点和所有不在范围内的终点。然后从第一个起点到第一个终点,然后从下一个起点到下一个终点,依此类推。由于 Markus 没有使用窗口函数来对起点和终点进行编号,因此他必须找到一种更复杂的方法来实现这一目标。这是带有ROW_NUMBER的查询。也许这给了你一个开始,在马库斯的查询中寻找什么。

select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
  select id, d1 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d1 > m2.d1 and m1.d1 <= m2.d2
  )
) startpoint
join
(
  select id, d2 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d2 >= m2.d1 and m1.d2 < m2.d2
  )
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;

如果所有间隔都从不同的日期开始,请按 d1 的升序考虑它们,计算从 d1 到下一个间隔的天数。您可以丢弃它包含在另一个间隔中的间隔。最后一个间隔将没有关注者。

此查询应提供每个间隔的天数

select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1

然后按 id 和总和天数分组

最新更新