合并和聚合两列 SQL

  • 本文关键字:两列 SQL 合并 sql hive
  • 更新时间 :
  • 英文 :


我有一个包含idnamescore的表格,我正在尝试提取得分最高的用户。每个用户可能有多个条目,因此我希望对分数求和,按用户分组。

我已经研究了JOIN操作,但是当有两个单独的表而不是单个表的两个"视图"时,它们似乎被使用。

问题是,如果存在id字段,则用户将没有name,反之亦然。

可以在以下链接中找到一个最小示例:http://sqlfiddle.com/#!9/ce0629/11

基本上,我有以下数据:

id    name    score
---   -----   ------
1     ''      15
4     ''      20
NULL  'paul'   8
NULL  'paul'  11
1     ''      13
4     ''      17
NULL  'simon'  9
NULL  'simon' 12

我最终想要的是:

id/name     score
--------    ------
4           37
1           28
'simon'     21
'paul'      19

我可以轻松地按id分组,但它将 NULL 视为单个字段,而实际上它们是两个单独的用户。

SELECT id, SUM(score) AS total FROM posts GROUP BY id ORDER by total DESC;

id    score
---   ------
NULL  40
4     37
1     28

提前谢谢。

更新

此查询的目标环境位于 Hive 中。下面是仅查看id字段的查询和输出:

hive> SELECT SUM(score) as total, id FROM posts WHERE id is not NULL GROUP BY id ORDER BY total DESC LIMIT 10;
...
OK
29735   87234
20619   9951
20030   4883
19314   6068
17386   89904
13633   51816
13563   49153
13386   95592
12624   63051
12530   39677

运行下面的查询会给出完全相同的输出:

hive> select coalesce(id, name) as idname, sum(score) as total from posts group by coalesce(id, name) order by total desc limit 10;

使用新的计算列名称运行以下查询idname将产生错误:

hive> select coalesce(id, name) as idname, sum(score) as total from posts group by idname order by total desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:83 Invalid table alias or column reference 'idname': (possible column names are: score, id, name)

您的id看起来是数字。 在某些数据库中,对数字和字符串使用coalesce()可能是一个问题。 无论如何,我建议明确说明类型:

select coalesce(cast(id as varchar(255)), name) as id_name,
sum(score) as total
from posts
group by id_name
order by total desc;
SELECT new_id, SUM(score) FROM 
(SELECT coalesce(id,name) new_id, score FROM posts)o 
GROUP BY new_id ORDER by total DESC;

您可以使用 COALESCE 获取任一列的非 NULL 值:

SELECT 
COALESCE(id, name) AS id
, SUM(score) AS total 
FROM 
posts 
GROUP BY 
COALESCE(id, name)
ORDER by total DESC;

最新更新