我有一个包含id
、name
和score
的表格,我正在尝试提取得分最高的用户。每个用户可能有多个条目,因此我希望对分数求和,按用户分组。
我已经研究了JOIN
操作,但是当有两个单独的表而不是单个表的两个"视图"时,它们似乎被使用。
问题是,如果存在id
字段,则用户将没有name
,反之亦然。
可以在以下链接中找到一个最小示例:http://sqlfiddle.com/#!9/ce0629/11
基本上,我有以下数据:
id name score
--- ----- ------
1 '' 15
4 '' 20
NULL 'paul' 8
NULL 'paul' 11
1 '' 13
4 '' 17
NULL 'simon' 9
NULL 'simon' 12
我最终想要的是:
id/name score
-------- ------
4 37
1 28
'simon' 21
'paul' 19
我可以轻松地按id
分组,但它将 NULL 视为单个字段,而实际上它们是两个单独的用户。
SELECT id, SUM(score) AS total FROM posts GROUP BY id ORDER by total DESC;
id score
--- ------
NULL 40
4 37
1 28
提前谢谢。
更新
此查询的目标环境位于 Hive 中。下面是仅查看id
字段的查询和输出:
hive> SELECT SUM(score) as total, id FROM posts WHERE id is not NULL GROUP BY id ORDER BY total DESC LIMIT 10;
...
OK
29735 87234
20619 9951
20030 4883
19314 6068
17386 89904
13633 51816
13563 49153
13386 95592
12624 63051
12530 39677
运行下面的查询会给出完全相同的输出:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by coalesce(id, name) order by total desc limit 10;
使用新的计算列名称运行以下查询idname
将产生错误:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by idname order by total desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:83 Invalid table alias or column reference 'idname': (possible column names are: score, id, name)
您的id
看起来是数字。 在某些数据库中,对数字和字符串使用coalesce()
可能是一个问题。 无论如何,我建议明确说明类型:
select coalesce(cast(id as varchar(255)), name) as id_name,
sum(score) as total
from posts
group by id_name
order by total desc;
SELECT new_id, SUM(score) FROM
(SELECT coalesce(id,name) new_id, score FROM posts)o
GROUP BY new_id ORDER by total DESC;
您可以使用 COALESCE 获取任一列的非 NULL 值:
SELECT
COALESCE(id, name) AS id
, SUM(score) AS total
FROM
posts
GROUP BY
COALESCE(id, name)
ORDER by total DESC;