如何在Postgresql的窗口函数的PARTITION BY中包含当前行



>我正在尝试执行以下操作;假设我想在给定设置条件的情况下将表划分为两个分区:

SELECT
userid,
ARRAY_AGG(userid) OVER (
PARTITION BY userid > 100
) arr,
AVG(userid) OVER (
PARTITION BY userid > 100
) avg
FROM users;

我会得到这个:

userid |                            arr                            |         avg          
--------+-----------------------------------------------------------+----------------------
46 | {46,23,69,92}                                             |  57.5000000000000000
23 | {46,23,69,92}                                             |  57.5000000000000000
69 | {46,23,69,92}                                             |  57.5000000000000000
92 | {46,23,69,92}                                             |  57.5000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ...                                                       | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143

一切都很好,但是如果相反,对于 100<的用户 ID=" 包含在=" _x0031_00=">

SELECT
userid,
CASE WHEN userid > 100
THEN ARRAY_AGG(userid) OVER (
PARTITION BY userid > 100
)
ELSE ARRAY_AGG(userid) OVER (
PARTITION BY userid -- OR userid > 100
-- PARTITION BY userid > 100 OR CURRENT_ROW
-- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
)
END arr
CASE WHEN userid > 100
THEN AVG(userid) OVER (
PARTITION BY userid > 100
)
ELSE AVG(userid) OVER (
PARTITION BY userid -- OR userid > 100
-- PARTITION BY userid > 100 OR CURRENT_ROW
-- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
)
END avg
FROM users;

上面所有注释的代码都是我一直在做的各种尝试。 我得到的最好的要么只是用户 id,没有 100 个或所有用户 ID,要么>:

userid |                            arr                            |         avg          
--------+-----------------------------------------------------------+----------------------
23 | {23}                                                      |  23.0000000000000000
46 | {46}                                                      |  46.0000000000000000
69 | {69}                                                      |  69.0000000000000000
92 | {92}                                                      |  92.0000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ...                                                       | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143

有什么方法可以做到我想要的吗?我也尽量不使用 CTE,因为实际代码的技术债务太多,需要相当长的时间才能用 WITH 来适应它。

需要明确的是,这是预期的结果:

userid |                             arr                              |         avg
--------+--------------------------------------------------------------|----------------------
23 | {23,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 588.6000000000000000
46 | {46,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 590.1333333333333334
69 | {69,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 591.6666666666666667
92 | {92,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 593.2000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529}    | 629.2142857142857143
... | ...                                                          | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529}    | 629.2142857142857143

以下是我一直在研究的潜在未来内容的参考:嵌套窗口函数(但目前尚未实现,截至 Postgresql-11(

编辑:最后但并非最不重要的一点是,条件是一个占位符! 它可能与用户ID相关联,也可能不绑定,它只是为了示例而在此处使用,它本来可以是

CUME_DIST() OVER (
PARTITION BY x -- OR CURRENT_USERID
)

这回答了问题的原始版本。

你似乎想要:

select (case when userid < 100
then array_cat(array[userid],
array_agg(userid) filter (where userid > 100) over ()
else array_agg(userid) filter (where userid > 100) over ()
end)

最新更新