我有以下数据:
country objectid objectuse
record_date
2022-07-20 chile 0 4
2022-07-01 chile 1 4
2022-07-02 chile 1 4
2022-07-03 chile 1 4
2022-07-04 chile 1 4
... ... ... ...
2022-07-26 peru 3088 4
2022-07-27 peru 3088 4
2022-07-28 peru 3088 4
2022-07-30 peru 3088 4
2022-07-31 peru 3088 4
该数据描述了一个国家(2022年7月(内某个对象的日常使用情况,并非所有对象都是每天使用的。我感兴趣的一件事是这个月的月最大值的总和:
WITH month_max AS (
SELECT
country,
objectid,
MAX(objectuse) AS maxuse
FROM mytable
GROUP BY
country,
objectid
)
SELECT
country,
SUM(maxuse)
FROM month_max
GROUP BY country;
结果是:
country sum
-------------
chile 1224
peru 17008
但我真正想要的是得到从月初到每个日期的最大值的滚动总和。这样我得到的东西看起来像:
country sum
record_date
2022-07-01 chile 1
2022-07-01 peru 1
2022-07-02 chile 2
2022-07-02 peru 3
... ... ...
2022-07-31 chile 1224
2022-07-31 peru 17008
我试着使用这样的窗口功能,但没有用:
SELECT
*,
SUM(objectuse) OVER (
PARTITION BY country
ORDER BY record_date ROWS 30 PRECEDING
) as cumesum
FROM mytable
order BY cumesum DESC;
有没有一种方法可以在SQL中实现所需的结果?
提前谢谢。
编辑:值得一提的是,我问了同样的问题,但在Pandas上,我得到了答案;也许它有助于弄清楚如何在SQL中实现这一点。
最终起作用的方法可能不是解决这个问题的最有效方法。从一个月的每一天到月初,我基本上都创建了向后看的区块。在这些桶中的每一个桶中,我为该桶中的每个objectid
获得最大值objectuse
。取了最大值之后,我对那个向后看的时期的所有最大值求和。我在数据中每天都这样做。
这是一个查询:
WITH daily_lookback AS (
SELECT
A.record_date,
A.country,
B.objectid,
MAX(B.objectuse) AS maxuse
FROM mytable AS A
LEFT JOIN mytable AS B
ON A.record_date >= B.record_date
AND A.country = B.country
AND DATE_PART('month', A.record_date) = DATE_PART('month', B.record_date)
AND DATE_PART('year', A.record_date) = DATE_PART('year', B.record_date)
GROUP BY
A.record_date,
A.country,
B.objectid
)
SELECT
record_date,
country,
SUM(maxuse) AS usetotal
FROM daily_lookback
GROUP BY
record_date,
country
ORDER BY
record_date;
这正是我想要的:向后看时期的objectid
最大值的累积和,如下所示:
country sum
record_date
2022-07-01 chile 1
2022-07-01 peru 1
2022-07-02 chile 2
2022-07-02 peru 3
... ... ...
2022-07-31 chile 1224
2022-07-31 peru 17008
您需要更改内部查询以使用窗口最大值:
WITH month_max AS (
SELECT record_date, country, objectid,
MAX(objectuse) over (PARTITION BY country, objectid ORDER BY record_date) AS mx
FROM mytable
)
SELECT record_date, country, SUM(mx) as "sum"
FROM month_max
GROUP BY record_date, country;
这确实假设每个对象每个日期有一行。
这是您的查询的重写版本。有了索引,它可能会运行得更快:
select record_date, country, min(usetotal) as usetotal
from mytable d inner join lateral (
select distinct sum(max(objectuse)) over () as usetotal from mytable a
where a.record_date between date_trunc('month', d.record_date) and d.record_date
and a.country = d.country
group by objectid
) T on 1 = 1
group by record_date, country
order by record_date, country;
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=63760e30ecf4c885ec4967045b6cd03