平均忽略最大值或最小值



我想使用avg获得某些值的平均值,但是只有当它们为1.5波纹管或高于第二个最大值和最小值时,才忽略最大和最小值。我将举一些例子:

示例1:

SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL ) D

我需要这个结果,忽略103.1值:

100.5
101.5

示例2:

SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL UNION
SELECT 106.2 v FROM DUAL) D

我需要这个结果,只忽略106.2值:

100.5
101.1
103.1

示例3:

SELECT *
FROM (
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 v FROM DUAL UNION
SELECT 103.0 v FROM DUAL UNION
SELECT 105.0 v FROM DUAL UNION
SELECT 107.0 v FROM DUAL) D

我需要这个结果,忽略100.0和107.0值:

102.0
103.0
105.0

当只有两个值时,它无关紧要。有了正确的结果,我可以正确地avg(value)。

您需要组合分析功能(铅/滞后)和条件聚集。这就是我想到的。请注意,我允许多个组,必须分别为每个组分别计算"调整后的"平均值(统计数据中的常见任务,当您必须在每个组存在的情况下抛出异常值时):

with
     inputs ( id, val ) as (
       select 101, 33   from dual union all
       select 102, 23   from dual union all
       select 102, 22.8 from dual union all
       select 103, 30   from dual union all
       select 103, 40   from dual union all
       select 104, 90   from dual union all
       select 104, 92   from dual union all
       select 104, 92   from dual union all
       select 104, 91.5 from dual union all
       select 104, 91.7 from dual
     )
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select    id, 
          avg ( case when cnt >= 3 
                        and ( lag_val is null and lead_val - val >= 1.5 
                              or
                              lead_val is null and val - lag_val >= 1.5
                            )
                     then null
                     else val
                end
              ) as adjusted_avg_val
from      (
            select id, val, count(val) over (partition by id) as cnt,
                   lag  ( val ) over ( partition by id order by val ) as lag_val,
                   lead ( val ) over ( partition by id order by val ) as lead_val
            from   inputs
          )
group by id
;

输出

 ID  ADJUSTED_AVG_VAL
---  ----------------
101                33
102              22.9
103                35                      
104              91.8

尝试使用row_number leadlag

的以下组合
with cte as (
SELECT 100.5 v FROM DUAL UNION ALL
SELECT 101.5 v FROM DUAL UNION ALL
SELECT 103.1 v FROM DUAL UNION ALL
SELECT 106.2 v FROM DUAL)
-- end of sample data
select avg(v)
from
(
  select row_number() over (order by v desc) arn, 
         row_number() over (order by v) drn, 
         lag(v) over (order by v) av,
         lead(v) over (order by v) dv,
         v
  from cte
) t
where (arn != 1 and drn != 1) or        -- if they are no maximum nor minumum
        (drn = 1 and v + 1.5 > dv) or   -- if they are minimum
        (arn = 1 and v - 1.5 < av) or   -- if they are maximum
        (av is null and arn < 3) or     -- if there are just two ore one value
        (dv is null and drn < 3)        -- if there are just two ore one value

在sql中,您只需要表达结果,所以...

WITH D as(
SELECT  100.0 v FROM DUAL UNION
SELECT 102.0 FROM DUAL UNION
SELECT 103.0 FROM DUAL UNION
SELECT 105.0 FROM DUAL UNION
SELECT 107.0 FROM DUAL)
SELECT  avg(v)
FROM D
where (v < (select max(v) from D )
       and ((select max(v) from D )
            -(select max(v) from D where v != 
              (select max(v) from D ) ) > 1.5))
or
(v > (select min(v) from D )
       and ((select min(v) from D )
            +(select min(v) from D where v != 
              (select min(v) from D ) ) > 1.5))

...应该做技巧!

但要提前思考...以下版本也可能有用;)

  WITH D as(
    SELECT 1 PK, 100.0 v FROM DUAL UNION
    SELECT 1,102.0 FROM DUAL UNION
    SELECT 1,103.0 FROM DUAL UNION
    SELECT 1,105.0 FROM DUAL UNION
    SELECT 1,107.0 FROM DUAL)
    SELECT  PK,avg(v)
    FROM D
    where (v < (select max(v) from D group by PK)
           and ((select max(v) from D group by PK)
                -(select max(v) from D where v != 
                  (select max(v) from D group by PK) group by PK) > 1.5))
    or
    (v > (select min(v) from D group by PK)
           and ((select min(v) from D group by PK)
                +(select min(v) from D where v != 
                  (select min(v) from D group by PK) group by PK) > 1.5))
    GROUP BY PK

在现实生活中,尽管您也会在大型数据集中考虑上述执行计划(作业)。

为了进行任何进一步的澄清,我可以通过评论来掌握。

真诚的,ted

最新更新