按平均排序,然后使用Hive插入到新表中



我有一个大的数据文件,看起来像这样:

    1   6
    1   6
    2   7
    3   2
    3   6
    1   7
    1   9
    2   9
    1   5
    3   9
    3   1
    2   8

我想按第一列对数据进行分组,找到每个第一列值的第二列平均值,然后按第二列平均值对这些分组进行排序。所以输出应该是:

    2   8
    1   6.6
    3   4.5

我的代码现在看起来像这样,不工作:

    CREATE EXTERNAL TABLE as (a STRING, b INT)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
    LOCATION 's3n://myfolder/hive';
    CREATE EXTERNAL TABLE output(a STRING, avgb DOUBLE)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
    LOCATION 's3n://myfolder/hive';
    load data inpath "s3n://myfolder/file.txt" into TABLE as;
    insert overwrite output select a, avg(b) from as group by a order by avg(b) DESC limit 1000;

我应该注意到,以下确实工作,但有些东西不工作的顺序和插入步骤,为我在SQL中工作:

    select a, avg(b) from as group by a;

当我尝试:

    select a, avg(b) from as group by a order by avg(b);

我得到"FAILED: Error in semantic analysis: Line 1:66无效的表别名或列引用'b':(可能的列名是:_col0, _col1).

只需在子查询中将其移出:

select a
from (select a, avg(b) as avgb from as group by a) as t
order by avgb;

相关内容

  • 没有找到相关文章

最新更新