平均过度难以定义分区



我有此表:

create table t (value int, dt date);
 value |     dt     
-------+------------
    10 | 2012-10-30
    15 | 2012-10-29
  null | 2012-10-28
  null | 2012-10-27
     7 | 2012-10-26

我想要此输出:

 value |     dt     
-------+------------
    10 | 2012-10-30
     5 | 2012-10-29
     5 | 2012-10-28
     5 | 2012-10-27
     7 | 2012-10-26

我希望null值以及一个先前的非零值,在按日期下达订购时,被上一个非零值的平均值代替。在此示例中,值15是下两个nulls的先前而不是零值。因此15/3 = 5。

SQL小提琴

我找到了一个令人惊讶的简单解决方案:

SELECT max(value) OVER (PARTITION BY grp)
      / count(*)  OVER (PARTITION BY grp) AS value
      ,dt
FROM   (
   SELECT *, count(value) OVER (ORDER BY dt DESC) AS grp
   FROM   t
   ) a;

-> sqlfiddle

由于count()忽略了NULL值,因此您可以使用运行计数(默认窗口函数中的默认计数)来快速组值( -> grp)。

每个组都具有 ,因此我们可以使用min/max/sum在另一个窗口函数中获得相同的结果。除以grp中的成员数(这次count(*),以计数NULL值!),我们完成了。

作为难题,这是一个解决方案...实际上,它可能会根据数据的性质执行可怕的作用。在任何情况下都观看您的索引:

create database tmp;
create table t (value float, dt date); -- if you use int, you need to care about rounding
insert into t values (10, '2012-10-30'), (15, '2012-10-29'), (null, '2012-10-28'), (null, '2012-10-27'), (7, '2012-10-26');
select t1.dt, t1.value, t2.dt, t2.value, count(*) cnt 
from t t1, t t2, t t3 
where 
    t2.dt >= t1.dt and t2.value is not null 
    and not exists (
        select * 
        from t 
        where t.dt < t2.dt and t.dt >= t1.dt and t.value is not null
    ) 
    and t3.dt <= t2.dt 
    and not exists (
        select * 
        from t where t.dt >= t3.dt and t.dt < t2.dt and t.value is not null
    ) 
group by t1.dt;
+------------+-------+------------+-------+-----+
| dt         | value | dt         | value | cnt |
+------------+-------+------------+-------+-----+
| 2012-10-26 |     7 | 2012-10-26 |     7 |   1 |
| 2012-10-27 |  NULL | 2012-10-29 |    15 |   3 |
| 2012-10-28 |  NULL | 2012-10-29 |    15 |   3 |
| 2012-10-29 |    15 | 2012-10-29 |    15 |   3 |
| 2012-10-30 |    10 | 2012-10-30 |    10 |   1 |
+------------+-------+------------+-------+-----+
5 rows in set (0.00 sec)
select dt, value/cnt 
from (
    select t1.dt , t2.value, count(*) cnt 
    from t t1, t t2, t t3 
    where 
        t2.dt >= t1.dt and t2.value is not null 
        and not exists (
            select * 
            from t 
            where t.dt < t2.dt and t.dt >= t1.dt and t.value is not null
        ) 
    and t3.dt <= t2.dt 
    and not exists (
        select * 
        from t 
        where t.dt >= t3.dt and t.dt < t2.dt and t.value is not null
    ) 
    group by t1.dt
) x;
+------------+-----------+
| dt         | value/cnt |
+------------+-----------+
| 2012-10-26 |         7 |
| 2012-10-27 |         5 |
| 2012-10-28 |         5 |
| 2012-10-29 |         5 |
| 2012-10-30 |        10 |
+------------+-----------+
5 rows in set (0.00 sec)

说明:

  • T1是原始表
  • t2是表中的行,最小日期,非零值
  • t3都在介于两者之间,因此我们可以由其他人分组并计数

对不起,我不能更清楚。这也让我感到困惑: - )

相关内容

  • 没有找到相关文章

最新更新