如何通过固定一些变量来计算中位数



我有一个已经聚合的数据集。这基本上显示了每只猫、root_cat和集群每天的中间价格。

date         cluster  root_cat cat median_price
2020-12-07    A          X     1      20
2020-12-07    A          X     2      15
2020-12-07    A          X     2      30
2020-12-08    B          Y     3      24   

这是我为计算中间价格而写的查询。

SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median
FROM 
(SELECT pl.*,
NTILE(2)
OVER (PARTITION BY product_id
ORDER BY  min_price) AS tile2
FROM pl
WHERE cluster is NOT null
AND (date_parse(date, '%Y-%m-%d') >= current_date - interval '15' day) ) d
GROUP BY  1, 2, 3, 4

现在,我想再写一个专栏,显示除最近一天外,每只猫和root_cat在过去14天的中间价格。我该怎么做?

以下是所需的输出:

date         cluster  root_cat cat median_price  median_price_root     median_price_cat
2020-12-07    A          X     1      20                20                    20
2020-12-07    A          X     2      15                20                    22,5
2020-12-07    A          X     2      30                20                    22,5
2020-12-08    B          Y     3      24                24                    24 

如果中值的近似值足够好,那么可以使用

SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median,
approx_percentile(price, 0.5) -- <<== the 0.5 percentile is the median
FROM ...

请参阅此处的文档以获取approc_percentile函数。

最新更新