在带有日期或时间戳(带/不带时区)的查询中处理generate_series()



我有一个查询,用于根据按dateemployee_id分组的日期序列生成报告。日期应以特定时区为基础,在本例中为"亚洲/夸拉_卢布尔"。但这可能会根据用户的时区而改变


SELECT 
d::date AT TIME ZONE 'Asia/Kuala_Lumpur' AS created_date,  
e.id,  
e.name,
e.division_id,
ARRAY_AGG(
a.id
) as rows,        
MIN(a.created_at) FILTER (WHERE a.activity_type = 1) as min_time_in,
MAX(a.created_at) FILTER (WHERE a.activity_type = 2) as max_time_out,
ARRAY_AGG(
CASE
WHEN a.activity_type = 1
THEN a.created_at
ELSE NULL
END
) as check_ins,
ARRAY_AGG(
CASE
WHEN a.activity_type = 2
THEN a.created_at
ELSE NULL
END
) as check_outs        
FROM    (SELECT MIN(created_at), MAX(created_at) FROM attendance) AS r(startdate,enddate)
, generate_series(
startdate::timestamp, 
enddate::timestamp, 
interval '1 day') g(d)
CROSS JOIN  employee e
LEFT JOIN   attendance a ON a.created_at::date = d::date AND e.id = a.employee_id
where d::date = date '2020-11-20' and division_id = 1
GROUP BY 
created_date
, e.id
, e.name
, e.division_id
ORDER BY 
created_date
, e.id;

attendance:的定义和样本数据

CREATE TABLE attendance (
id int,
employee_id int,
activity_type int,
created_at timestamp with time zone NOT NULL
);
INSERT INTO attendance VALUES
( 1, 1, 1,'2020-11-18 07:10:25 +00:00'),
( 2, 2, 1,'2020-11-18 07:30:25 +00:00'),
( 3, 3, 1,'2020-11-18 07:50:25 +00:00'),
( 4, 2, 2,'2020-11-18 19:10:25 +00:00'),
( 5, 3, 2,'2020-11-18 19:22:38 +00:00'),
( 6, 1, 2,'2020-11-18 20:01:05 +00:00'),
( 7, 1, 1,'2020-11-19 07:11:23 +00:00'),
( 8, 1, 2,'2020-11-19 16:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
( 9, 1, 1,'2020-11-19 19:11:23 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_ins field in the results output)
(10, 1, 2,'2020-11-19 20:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
(11, 1, 1,'2020-11-20 07:41:38 +00:00'),
(12, 1, 2,'2020-11-20 08:52:01 +00:00');

这是要测试的小提琴

该查询不包括时区Asia/Kuala_Lumpur+8的输出中的第8-10行,尽管它应该包括。结果显示;行";字段CCD_ 4。

如何修复查询,使其根据给定时区的日期生成报告?(意味着我可以将Asia/Kuala_Lumpur更改为America/New_York等(

我被告知要做这样的事情:

where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
and   created_at <  timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'

但我不知道如何使用它。在这个小提琴里似乎不能正常工作。它应该包括第8、9、10、11、12行,但只显示第8、9,10行。

数据库设计

考虑对您的设置进行一些修改:

CREATE TABLE employee (
id           int PRIMARY KEY  -- !
, name         text             -- do NOT use char(n) !
, division_id  int
);
CREATE  TABLE attendance (
id             int PRIMARY KEY  --!
, employee_id    int NOT NULL REFERENCES employee -- FK!
, activity_type  int
, created_at     timestamptz NOT NULL
);

定义PK可以更容易地聚合行,因为PK覆盖了GROUP BY子句中的整行。参见:

  • 为什么可以';按键聚合时,是否从"GROUP BY"中排除依赖列

我不会使用";name";作为列名。这不是描述性的。每隔一列可以命名为";名称";。考虑:

  • 使用数据类型";文本";用于存储字符串
  • 如何在PostgreSQL中实现多对多关系

查询

SELECT *
FROM  (        -- complete employee/date grid for division in range
SELECT g.d::date AS the_date, id AS employee_id, name, division_id
FROM  (
SELECT generate_series(MIN(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
, MAX(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
, interval '1 day')
FROM   attendance
) g(d)
CROSS  JOIN employee e
WHERE  e.division_id = 1
) de
LEFT   JOIN (  -- checkins & checkouts per employee/date for division in range
SELECT employee_id, ts::date AS the_date
, array_agg(id) as rows
, min(ts)             FILTER (WHERE activity_type = 1) AS min_check_in
, max(ts)             FILTER (WHERE activity_type = 2) AS max_check_out
, array_agg(ts::time) FILTER (WHERE activity_type = 1) AS check_ins
, array_agg(ts::time) FILTER (WHERE activity_type = 2) AS check_outs
FROM  (
SELECT a.id, a.employee_id, a.activity_type, a.created_at AT TIME ZONE 'Asia/Kuala_Lumpur' AS ts  -- convert to timestamp
FROM   employee   e
JOIN   attendance a ON a.employee_id = e.id
-- WHERE  a.created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' -- "sargable" expressions
-- AND    a.created_at <  timestamp '2020-11-21' AT TIME ZONE 'Asia/Kuala_Lumpur' -- exclusive upper bound (includes all of 2020-11-20);
AND    e.division_id = 1
ORDER  BY a.employee_id, a.created_at, a.activity_type  -- optional to guarantee sorted arrays
) sub
GROUP  BY 1, 2
) a USING (the_date, employee_id)
ORDER  BY 1, 2;

db<gt;小提琴这里

请注意,我的查询输出亚洲的本地日期和时间/Kuala_Lumpur:

test=> SELECT timestamptz '2020-11-20 08:52:01 +0' AT TIME ZONE 'Asia/Kuala_Lumpur' AS local_ts;
local_ts       
---------------------
2020-11-20 16:52:01

从哪里开始?您需要了解时区的概念以及Postgres数据类型timestamp with time zone(timestamptz(与timestamp without time zone(timestamp(。否则,这将是无止境的混乱。从这里开始:

  • 在Rails和PostgreSQL中完全忽略时区

最值得注意的是,timestamptz不存储时区:

  • 数据类型的时区存储";带有时区的时间戳">

当简单地将timestamptz强制转换为datetimestamp时,假定会话的当前时区设置不是你想要的。使用AT TIME ZONE构造显式地提供一个时区,以避免此pifall。在你的小提琴你有两个:

...
, generate_series(
startdate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur', 
enddate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur', 
interval '1 day') g(d)
...

还有不要做你想做的事。在(faulty!(转换为timestamp之后,AT TIME ZONE构造将这些值转换回timestamptz

此外,您的查询会生成所有用户的完整笛卡尔乘积,以及表attendance中的最大天数范围,只会使用将其减少到一天

where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
and   created_at <  timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'

WHERE子句终于完成了它应该做的事情。但是,首先生成完整的天数,然后丢弃大部分天数是没有意义的。(似乎你在这期间从我的另一把小提琴上复制了这一点?(

我注释掉了WHERE子句,并在查询中保留了您的generate_series()的优化版本作为概念验证。进一步阅读:

  • 在PostgreSQL中生成两个日期之间的时间序列

相关内容

  • 没有找到相关文章

最新更新