我有一个Postgres 9.1数据库。我正在尝试生成每周的记录数量(对于给定的日期范围),并将其与前一年进行比较。
我有以下代码用于生成该系列:
select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
但是,我不知道如何将计数的记录与生成的日期连接起来。
因此,以以下记录为例:
Pt_ID exam_date
====== =========
1 2012-01-02
2 2012-01-02
3 2012-01-08
4 2012-01-08
1 2013-01-02
2 2013-01-02
3 2013-01-03
4 2013-01-04
1 2013-01-08
2 2013-01-10
3 2013-01-15
4 2013-01-24
我想让记录返回为:
series thisyr lastyr
=========== ===== =====
2013-01-01 4 2
2013-01-08 3 2
2013-01-15 1 0
2013-01-22 1 0
2013-01-29 0 0
不确定如何引用子搜索中的日期范围。感谢您的帮助。
简单的方法是用@jpw演示的CROSS JOIN来解决这个问题。然而,存在一些隐藏的问题:
-
无条件
CROSS JOIN
的性能会随着行数的增长而快速恶化。在聚合中处理这个庞大的派生表之前,总行数乘以您正在测试的周数。索引也无济于事。 -
从1月1日开始的几周会导致不一致ISO周可能是一种选择。请参见下文。
以下所有查询都大量使用exam_date
上的索引。一定要有一个。
仅联接到相关行
应该快得多:
SELECT d.day, d.thisyr
, count(t.exam_date) AS lastyr
FROM (
SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0 -- for 2nd join
, count(t.exam_date) AS thisyr
FROM generate_series('2013-01-01'::date
, '2013-01-31'::date -- last week overlaps with Feb.
, '7 days'::interval) d(day) -- returns timestamp
LEFT JOIN tbl t ON t.exam_date >= d.day::date
AND t.exam_date < d.day::date + 7
GROUP BY d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.day, d.thisyr
ORDER BY d.day;
这是从1月1日开始的几周,就像你原来的一样。正如所评论的,这产生了一些不一致之处:每周从每年不同的一天开始,由于我们在年底中断,一年中的最后一周只有1或2天(闰年)。
与ISO周相同
根据要求,请考虑ISO周,该周从周一开始,通常为期7天。但它们跨越了岁月的边界。根据EXTRACT()
:上的文件
周
一年中一天所在的周数。根据定义(ISO 8601),周从星期一开始年份包含当年的1月4日。换句话说,第一一年中的星期四在该年的第一周。
在ISO定义中,1月初的日期可能是前一年第52周或第53周的一部分12月下旬是明年第一周的一部分。对于例如,
2005-01-01
是2004年第53周的一部分,并且2006-01-01
是2005年第52周的一部分,而2012-12-31
是2013年第一周的一部分。建议使用isoyear
字段与CCD_ 8一起得到一致的结果。
以上查询用ISO周重写:
SELECT w AS isoweek
, day::text AS thisyr_monday, thisyr_ct
, day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct
FROM (
SELECT w, day
, date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0
, count(t.exam_date) AS thisyr_ct
FROM (
SELECT w
, date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day
FROM generate_series(0, 4) w
) d
LEFT JOIN tbl t ON t.exam_date >= d.day
AND t.exam_date < d.day + 7
GROUP BY d.w, d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.w, d.day, d.day0, d.thisyr_ct
ORDER BY d.w, d.day;
1月4日总是在一年中ISO的第一周。因此,这个表达式得到了给定年份的第一个ISO周的星期一:
date_trunc('week', '2012-01-04'::date)::date
使用EXTRACT()
简化
由于ISO周数与EXTRACT()
返回的周数一致,我们可以简化查询。首先,一个简短而简单的形式:
SELECT w AS isoweek
, COALESCE(thisyr_ct, 0) AS thisyr_ct
, COALESCE(lastyr_ct, 0) AS lastyr_ct
FROM generate_series(1, 5) w
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2013
GROUP BY 1
) t13 USING (w)
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2012
GROUP BY 1
) t12 USING (w);
优化查询
更多细节相同,并针对性能进行了优化
WITH params AS ( -- enter parameters here, once
SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
, date_trunc('week', '2013-01-04'::date)::date AS this_start
, date_trunc('week', '2014-01-04'::date)::date AS next_start
, 1 AS week_1
, 5 AS week_n -- show weeks 1 - 5
)
SELECT w.w AS isoweek
, p.this_start + 7 * (w - 1) AS thisyr_monday
, COALESCE(t13.ct, 0) AS thisyr_ct
, p.last_start + 7 * (w - 1) AS lastyr_monday
, COALESCE(t12.ct, 0) AS lastyr_ct
FROM params p
, generate_series(p.week_1, p.week_n) w(w)
LEFT JOIN (
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.this_start -- only relevant dates
AND t.exam_date < p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.next_start -- don't cross over into next year
GROUP BY 1
) t13 USING (w)
LEFT JOIN ( -- same for last year
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.last_start
AND t.exam_date < p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.this_start
GROUP BY 1
) t12 USING (w);
在指数支持下,这应该非常快,并且可以很容易地适应所选择的区间。上次查询中generate_series()
的隐式JOIN LATERAL
需要Postgres 9.3。
SQL Fiddle
使用cross join
应该可以工作,我将在下面粘贴SQL Fiddle的markdown输出。您的2013-01-08系列的样本输出似乎不正确:thisyr应该是2,而不是3。虽然这可能不是最好的方法,但我的Postgresql知识还有很多不足之处。
SQL Fiddle
PostgreSQL 9.2.4架构设置:
CREATE TABLE Table1
("Pt_ID" varchar(6), "exam_date" date);
INSERT INTO Table1
("Pt_ID", "exam_date")
VALUES
('1', '2012-01-02'),('2', '2012-01-02'),
('3', '2012-01-08'),('4', '2012-01-08'),
('1', '2013-01-02'),('2', '2013-01-02'),
('3', '2013-01-03'),('4', '2013-01-04'),
('1', '2013-01-08'),('2', '2013-01-10'),
('3', '2013-01-15'),('4', '2013-01-24');
查询1:
select
series,
sum (
case
when exam_date
between series and series + '6 day'::interval
then 1
else 0
end
) as thisyr,
sum (
case
when exam_date + '1 year'::interval
between series and series + '6 day'::interval
then 1 else 0
end
) as lastyr
from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series
结果:
| SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 | 4 | 2 |
| January, 08 2013 00:00:00+0000 | 2 | 2 |
| January, 15 2013 00:00:00+0000 | 1 | 0 |
| January, 22 2013 00:00:00+0000 | 1 | 0 |
| January, 29 2013 00:00:00+0000 | 0 | 0 |