我有一个已经聚合的数据集。这基本上显示了每只猫、root_cat和集群每天的中间价格。
date cluster root_cat cat median_price
2020-12-07 A X 1 20
2020-12-07 A X 2 15
2020-12-07 A X 2 30
2020-12-08 B Y 3 24
这是我为计算中间价格而写的查询。
SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median
FROM
(SELECT pl.*,
NTILE(2)
OVER (PARTITION BY product_id
ORDER BY min_price) AS tile2
FROM pl
WHERE cluster is NOT null
AND (date_parse(date, '%Y-%m-%d') >= current_date - interval '15' day) ) d
GROUP BY 1, 2, 3, 4
现在,我想再写一个专栏,显示除最近一天外,每只猫和root_cat在过去14天的中间价格。我该怎么做?
以下是所需的输出:
date cluster root_cat cat median_price median_price_root median_price_cat
2020-12-07 A X 1 20 20 20
2020-12-07 A X 2 15 20 22,5
2020-12-07 A X 2 30 20 22,5
2020-12-08 B Y 3 24 24 24
如果中值的近似值足够好,那么可以使用
SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median,
approx_percentile(price, 0.5) -- <<== the 0.5 percentile is the median
FROM ...
请参阅此处的文档以获取approc_percentile
函数。