查找具有休会日期范围的行并累积其持续时间



我的PostgreSQL数据库存储学校假期,公共假期和周末日期,供父母计划假期。很多时候,学校假期因周末或公众假期而休会。我想显示学校假期的非上学天数总数。这应包括任何休会的周末或公众假期。

示例数据

地点

SELECT id, name, is_federal_state 
FROM locations 
WHERE is_federal_state = true;
| id | name              | is_federal_state |
|----|-------------------|------------------|
| 2  | Baden-Württemberg | true             |
| 3  | Bayern            | true             |

holiday_or_vacation_types

SELECT id, name FROM holiday_or_vacation_types;
| id | name                  |
|----|-----------------------|
| 1  | Herbst                |
| 8  | Wochenende            |
"Herbst"在德语中是">

秋天"的意思,"Wochenende"是德语中"周末"的意思。

时期

SELECT id, starts_on, ends_on, holiday_or_vacation_type_id 
FROM periods 
WHERE location_id = 2 
ORDER BY starts_on;
| id  | starts_on    | ends_on      | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 8                           |

任务

我想selectlocation_id等于 2 的所有periods。我想以天为单位计算每个时期的持续时间。这可以通过以下 SQL 查询完成:

SELECT id, starts_on, ends_on, 
(ends_on - starts_on + 1) AS duration, 
holiday_or_vacation_type_id 
FROM periods
| id  | starts_on    | ends_on      | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        | 8                           |

任何查看日历的人都会看到ids 670(周末(,532(秋季假期(和533(秋季假期(被休会。所以他们加起来有 6 天的假期。到目前为止,我使用一个计算此值的程序来执行此操作。但这需要相当多的资源(实际表包含大约 500,000 个项目(。

问题1

哪个 SQL 查询将导致以下输出(是否添加real_duration列(?这在SQL中甚至可能吗?

| id  | starts_on    | ends_on      | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 6             | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 6             | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 6             | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        | 2             | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        | 2             | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        | 2             | 8                           |

问题2

是否可以在part_of_range字段中列出休会期?这就是结果。这可以用SQL完成吗?

| id  | starts_on    | ends_on      | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 670,532,533   | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 670,532,533   | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 670,532,533   | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        |               | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        |               | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        |               | 8                           |

这是一个差距和孤岛问题。 在这种情况下,您可以使用lag()来查看岛屿的起点,然后查看累积总和。

最后一个操作是一些聚合(使用窗口函数(:

SELECT p.*, 
(Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
Array_agg(p.id) OVER (PARTITION BY location_id) 
FROM (SELECT p.*, 
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id, 
lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
FROM periods 
) p
) p;

最新更新