Postgres - 计算累积数据的变化



我正在通过 Python 从几个 API 源收集数据,并将其添加到 Postgres 中的 2 个表中。

然后,我使用此数据制作报告,联接和分组/过滤数据。每天我都会添加数千行。

成本、收入和销售额始终是累积的,这意味着每个数据点都来自该产品的 t1,而 t2 是数据回收的时间。

因此,最新的数据拉取将包括所有先前的数据,直到 t1。 t1、t2 是 Postgres 中没有时区的时间戳。我目前使用 Postgres 10。

样本:

id, vendor_id, product_id, t1, t2, cost, revenue, sales
1, a, a, 2018-01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18, 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018-04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7
6, a, a, 2018-01-01, 2018-04-17, 40, 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8, a, c, 2018-01-02, 2018-04-17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4
10, b, e, 2018-25-02, 2018-04-17, 9, 0-, 3

成本和收入来自两个表,我在 vendor_id、product_id 和 t2 上加入它们。

有没有办法可以遍历所有数据并"移动"它并减去,这样我就不会有累积数据,而是拥有基于时间序列的数据?

这应该在存储之前完成,还是在制作报告时更好地完成?

作为参考,目前如果我想要一个在两次之间发生变化的报告,我会做两个子查询,但这似乎是向后而不是将数据放在时间序列中,只是聚合所需的间隔。

with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on ...

提前非常感谢!

您可以使用 LAG():

窗口功能:

。返回在偏移行之前偏移行处计算的值 分区内的当前行;如果没有这样的行,则改为 返回默认值(必须与值的类型相同)。两个偏移量 和默认值相对于当前行进行评估。如果省略, 偏移量默认为 1,默认值为 null。

with sample_data as (
select 1 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 50 as cost, 200 as revenue, 36 as sales
union all
select 2 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 55 as cost, 200 as revenue, 34 as sales
union all
select 3 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 35 as cost, 150 as revenue, 25 as sales
union all
select 4 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 25 as cost, 140 as revenue, 23 as sales
union all
select 5 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 16 as cost, 70 as revenue, 12 as sales
union all
select 6 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 13 as cost, 65 as revenue, 11 as sales
)
select sd.*
, coalesce(cost - lag(cost) over (partition by vendor_id, product_id order by t2),cost) cost_new
, coalesce(revenue - lag(revenue) over (partition by vendor_id, product_id order by t2),revenue) revenue_new
, coalesce(sales - lag(sales) over (partition by vendor_id, product_id order by t2),sales) sales_new
from sample_data sd
order by t2 desc

最新更新