我有一个大的数据文件,看起来像这样:
1 6
1 6
2 7
3 2
3 6
1 7
1 9
2 9
1 5
3 9
3 1
2 8
我想按第一列对数据进行分组,找到每个第一列值的第二列平均值,然后按第二列平均值对这些分组进行排序。所以输出应该是:
2 8
1 6.6
3 4.5
我的代码现在看起来像这样,不工作:
CREATE EXTERNAL TABLE as (a STRING, b INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
LOCATION 's3n://myfolder/hive';
CREATE EXTERNAL TABLE output(a STRING, avgb DOUBLE)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
LOCATION 's3n://myfolder/hive';
load data inpath "s3n://myfolder/file.txt" into TABLE as;
insert overwrite output select a, avg(b) from as group by a order by avg(b) DESC limit 1000;
我应该注意到,以下确实工作,但有些东西不工作的顺序和插入步骤,为我在SQL中工作:
select a, avg(b) from as group by a;
当我尝试:
select a, avg(b) from as group by a order by avg(b);
我得到"FAILED: Error in semantic analysis: Line 1:66无效的表别名或列引用'b':(可能的列名是:_col0, _col1).
只需在子查询中将其移出:
select a
from (select a, avg(b) as avgb from as group by a) as t
order by avgb;