postgreSQL窗口函数在滑动时间窗口上



我有以下数据:

country  objectid  objectuse
record_date
2022-07-20    chile         0          4
2022-07-01    chile         1          4
2022-07-02    chile         1          4
2022-07-03    chile         1          4
2022-07-04    chile         1          4
...             ...       ...        ...
2022-07-26     peru      3088          4
2022-07-27     peru      3088          4
2022-07-28     peru      3088          4
2022-07-30     peru      3088          4
2022-07-31     peru      3088          4

该数据描述了一个国家(2022年7月(内某个对象的日常使用情况,并非所有对象都是每天使用的。我感兴趣的一件事是这个月的月最大值的总和:

WITH month_max AS (
SELECT
country,
objectid,
MAX(objectuse) AS maxuse
FROM mytable
GROUP BY
country,
objectid
)
SELECT
country,
SUM(maxuse)
FROM month_max
GROUP BY country;

结果是:

country   sum
-------------
chile    1224
peru    17008   

但我真正想要的是得到从月初到每个日期的最大值的滚动总和。这样我得到的东西看起来像:

country       sum  
record_date
2022-07-01    chile         1
2022-07-01     peru         1
2022-07-02    chile         2
2022-07-02     peru         3
...             ...       ...
2022-07-31    chile       1224
2022-07-31     peru      17008

我试着使用这样的窗口功能,但没有用:

SELECT
*,
SUM(objectuse) OVER (
PARTITION BY country
ORDER BY record_date ROWS 30 PRECEDING
) as cumesum
FROM mytable
order BY cumesum DESC;

有没有一种方法可以在SQL中实现所需的结果?

提前谢谢。

编辑:值得一提的是,我问了同样的问题,但在Pandas上,我得到了答案;也许它有助于弄清楚如何在SQL中实现这一点。

最终起作用的方法可能不是解决这个问题的最有效方法。从一个月的每一天到月初,我基本上都创建了向后看的区块。在这些桶中的每一个桶中,我为该桶中的每个objectid获得最大值objectuse。取了最大值之后,我对那个向后看的时期的所有最大值求和。我在数据中每天都这样做。

这是一个查询:

WITH daily_lookback AS (
SELECT
A.record_date,
A.country,
B.objectid,
MAX(B.objectuse) AS maxuse
FROM mytable AS A
LEFT JOIN mytable AS B
ON A.record_date >= B.record_date
AND A.country = B.country
AND DATE_PART('month', A.record_date) = DATE_PART('month', B.record_date)
AND DATE_PART('year', A.record_date) = DATE_PART('year', B.record_date)
GROUP BY
A.record_date,
A.country,
B.objectid
)
SELECT
record_date,
country,
SUM(maxuse) AS usetotal
FROM daily_lookback
GROUP BY 
record_date,
country
ORDER BY
record_date;

这正是我想要的:向后看时期的objectid最大值的累积和,如下所示:

country       sum  
record_date
2022-07-01    chile         1
2022-07-01     peru         1
2022-07-02    chile         2
2022-07-02     peru         3
...             ...       ...
2022-07-31    chile       1224
2022-07-31     peru      17008

您需要更改内部查询以使用窗口最大值:

WITH month_max AS (
SELECT record_date, country, objectid,
MAX(objectuse) over (PARTITION BY country, objectid ORDER BY record_date) AS mx
FROM mytable
)
SELECT record_date, country, SUM(mx) as "sum"
FROM month_max
GROUP BY record_date, country;

这确实假设每个对象每个日期有一行。

这是您的查询的重写版本。有了索引,它可能会运行得更快:

select record_date, country, min(usetotal) as usetotal
from mytable d inner join lateral (
select distinct sum(max(objectuse)) over () as usetotal from mytable a
where a.record_date between date_trunc('month', d.record_date) and d.record_date
and a.country = d.country
group by objectid
) T on 1 = 1
group by record_date, country
order by record_date, country;

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=63760e30ecf4c885ec4967045b6cd03

最新更新