在postgres中,我想输出具有最高编号的人员。";discussed"每个月的请求,无论年份如何,即应该有12个输出。
ID PERSON REQUEST DATE
4 datanoise opened 2010-09-02
5 marsuboss opened 2010-09-02
6 m3talsmith opened 2010-09-06
7 sferik opened 2010-09-08
8 sferik opened 2010-09-09
8 dtrasbo discussed 2010-09-09
8 brianmario discussed 2010-09-09
8 sferik discussed 2010-09-09
9 rsim opened 2011-09-09
.....more tuples to follow
*这只是数据库的一小部分。还假设数据集足够大,所有月份都在日期列中表示。
测试数据
CREATE TEMPORARY TABLE foo( id SERIAL PRIMARY KEY, name INTEGER NOT NULL,
dt DATE NULL, request BOOL NOT NULL );
INSERT INTO foo (name,dt,request) SELECT random()*1000,
'2010-01-01'::DATE+('1 DAY'::INTERVAL)*(random()*3650), random()>0.5
FROM generate_series(1,100000) n;
SELECT * FROM foo LIMIT 10;
id | name | dt | request
----+------+------------+---------
1 | 110 | 2014-11-05 | f
2 | 747 | 2015-03-12 | t
3 | 604 | 2014-09-26 | f
4 | 211 | 2011-12-14 | t
5 | 588 | 2016-12-15 | f
6 | 96 | 2012-02-19 | f
7 | 17 | 2018-09-18 | t
8 | 591 | 2018-02-15 | t
9 | 370 | 2015-07-28 | t
10 | 844 | 2019-05-16 | f
现在您必须获得每个名称和月份的计数,然后获得最大计数,但这不会给您具有最大值的名称,这需要与之前的结果连接起来。为了只做一次分组,它在CTE中完成:
WITH totals AS (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo
WHERE request=true GROUP BY name,mon
)
SELECT * FROM
(SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
JOIN totals USING (mon,cnt);
如果几个名字有相同的最大计数,它们将被同时返回。如果要只保留一个,可以使用distrit ON:
WITH (same as above)
SELECT DISTINCT ON (mon) * FROM
(SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
JOIN totals USING (mon,cnt) ORDER BY mon,name;
您还可以使用DISTINCT ON每月只保留一行,由ORDER子句指定,在本例中为count desc,因此它保留最高的计数。
SELECT DISTINCT ON (mon) * FROM (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo
WHERE request=true GROUP BY name,mon
)x ORDER BY mon, cnt DESC;
…或者您可以通过将主键粘贴到传递给max()的数组中来破解argmax()函数,这意味着它将返回具有最大值的行id:
SELECT mon, cntid[1] cnt, name FROM
(SELECT mon, max(ARRAY[cnt,id]) cntid FROM (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt, min(id) id FROM foo
WHERE request=true GROUP BY name,mon
) x GROUP BY mon)y
JOIN foo ON (foo.id=cntid[2]);
哪个更快?…
给定表名为t01,列date为date1(字符串格式):
create temp table t02 as
select extract(month from CAST(date1 as date)) as month, person, count(*) nb from t01 where request = 'discussed' group by 1, 2 ;
create temp table t03 as
select month, max(nb) max_nb from t02 group by 1 ;
的结果是:
select month , person from t02 a natural join t03 b where a.nb = b.max_nb;
https://rextester.com/BYMM84335[: run here]1
我推荐distinct on
。如果要将所有月份合并为单个"uber-month":
select distinct on (extract(month from date)) person, extract(month from date), count(*) as num_discussed
from t
where request = 'discussed'
group by person, extract(month from date)
order by extract(month from date), num_discussed desc;
Distinct on
是一个非常方便的Postgres扩展。它按"组"的每行返回,"组"由括号中的表达式定义。该行是"第一行"。由order by
条款决定。
如果你想要最高的月份,而不考虑年份:
select distinct on (extract(month from date)) person, date_trunc('month', date), count(*) as num_discussed
from t
where request = 'discussed'
group by person, date_trunc('month', date)
order by extract(month from date), num_discussed desc;