如何计算第 20 个和第 80 个百分位数之间的值平均值



我在Google的BigQuery中有以下SQL查询:

SELECT
  sensor,
  date,
  time,
  perc,
  PERCENTILE_CONT(perc,
    0.2) OVER(PARTITION BY sensor, date) AS percentile20_day,
  PERCENTILE_CONT(perc,
    0.8) OVER(PARTITION BY sensor, date) AS percentile80_day
FROM
  sensordata
ORDER BY
  Sensor,
  date,
  time

现在,我想添加的另一个计算是两个百分位数之间的平均值。在此特定情况下,全天第 20 个和第 80 个百分位数之间的所有值的平均值。不过,我不想局限于这一天,并可能将其更改为时间窗口(例如,通过添加进一步的分区(。

到目前为止,我无法完成这项工作。我不明白我该如何添加这个。

您需要使用子查询:

SELECT sensor, date, time, perc, percentile20_day, percentile80_day,
       AVG(CASE WHEN perc >= percentile20_day AND perc <= percentile90_day THEN perc END) OVER () as avg_20_80,
FROM (SELECT sd.*
            PERCENTILE_CONT(perc, 0.2) OVER (PARTITION BY sensor, date) AS percentile20_day,
            PERCENTILE_CONT(perc, 0.8) OVER (PARTITION BY sensor, date) AS percentile80_day
      FROM sensordata sd
     ) sd
ORDER BY Sensor, date, time;

下面是 BigQuery Standard SQL

#standardSQL
SELECT sensor, `date`, AVG(perc) avg_percentile20_80_day
FROM (
  SELECT sensor, `date`, perc,
    PERCENTILE_CONT(perc, 0.2) OVER(PARTITION BY sensor, `date`) percentile20_day,
    PERCENTILE_CONT(perc, 0.8) OVER(PARTITION BY sensor, `date`) percentile80_day
  FROM `project.dataset.sensordata`
)
WHERE perc BETWEEN percentile20_day AND percentile80_day
GROUP BY sensor, `date`

试试这个:

SELECT date, AVG(CASE
    WHEN sensor < PERCENTILE_CONT(perc, 0.2) OVER(PARTITION BY sensor, date) THEN NULL
    WHEN sensor > PERCENTILE_CONT(perc, 0.8) OVER(PARTITION BY sensor, date) THEN NULL
    ELSE sensor
    ) AS AvgSensorBTW20thAnd80thPercentile
FROM sensordata
GROUP BY date

注意:未测试!

根据 MSDN 文档 AVG 函数忽略NULL s。

相关内容

最新更新