我在Google的BigQuery中有以下SQL查询:
SELECT
sensor,
date,
time,
perc,
PERCENTILE_CONT(perc,
0.2) OVER(PARTITION BY sensor, date) AS percentile20_day,
PERCENTILE_CONT(perc,
0.8) OVER(PARTITION BY sensor, date) AS percentile80_day
FROM
sensordata
ORDER BY
Sensor,
date,
time
现在,我想添加的另一个计算是两个百分位数之间的平均值。在此特定情况下,全天第 20 个和第 80 个百分位数之间的所有值的平均值。不过,我不想局限于这一天,并可能将其更改为时间窗口(例如,通过添加进一步的分区(。
到目前为止,我无法完成这项工作。我不明白我该如何添加这个。
您需要使用子查询:
SELECT sensor, date, time, perc, percentile20_day, percentile80_day,
AVG(CASE WHEN perc >= percentile20_day AND perc <= percentile90_day THEN perc END) OVER () as avg_20_80,
FROM (SELECT sd.*
PERCENTILE_CONT(perc, 0.2) OVER (PARTITION BY sensor, date) AS percentile20_day,
PERCENTILE_CONT(perc, 0.8) OVER (PARTITION BY sensor, date) AS percentile80_day
FROM sensordata sd
) sd
ORDER BY Sensor, date, time;
下面是 BigQuery Standard SQL
#standardSQL
SELECT sensor, `date`, AVG(perc) avg_percentile20_80_day
FROM (
SELECT sensor, `date`, perc,
PERCENTILE_CONT(perc, 0.2) OVER(PARTITION BY sensor, `date`) percentile20_day,
PERCENTILE_CONT(perc, 0.8) OVER(PARTITION BY sensor, `date`) percentile80_day
FROM `project.dataset.sensordata`
)
WHERE perc BETWEEN percentile20_day AND percentile80_day
GROUP BY sensor, `date`
试试这个:
SELECT date, AVG(CASE
WHEN sensor < PERCENTILE_CONT(perc, 0.2) OVER(PARTITION BY sensor, date) THEN NULL
WHEN sensor > PERCENTILE_CONT(perc, 0.8) OVER(PARTITION BY sensor, date) THEN NULL
ELSE sensor
) AS AvgSensorBTW20thAnd80thPercentile
FROM sensordata
GROUP BY date
注意:未测试!
根据 MSDN 文档 AVG 函数忽略NULL
s。