按月分组后发现最高



在postgres中,我想输出具有最高编号的人员。";discussed"每个月的请求,无论年份如何,即应该有12个输出。

ID  PERSON      REQUEST   DATE
4   datanoise   opened  2010-09-02 
5   marsuboss   opened  2010-09-02 
6   m3talsmith  opened  2010-09-06 
7   sferik  opened  2010-09-08 
8   sferik  opened  2010-09-09 
8   dtrasbo discussed   2010-09-09 
8   brianmario  discussed   2010-09-09 
8   sferik  discussed   2010-09-09 
9   rsim    opened  2011-09-09 
.....more tuples to follow

*这只是数据库的一小部分。还假设数据集足够大,所有月份都在日期列中表示。

测试数据

CREATE TEMPORARY TABLE foo( id SERIAL PRIMARY KEY, name INTEGER NOT NULL,
dt DATE NULL, request BOOL NOT NULL );
INSERT INTO foo (name,dt,request) SELECT random()*1000, 
'2010-01-01'::DATE+('1 DAY'::INTERVAL)*(random()*3650), random()>0.5 
FROM generate_series(1,100000) n;
SELECT * FROM foo LIMIT 10;
id | name |     dt     | request
----+------+------------+---------
1 |  110 | 2014-11-05 | f
2 |  747 | 2015-03-12 | t
3 |  604 | 2014-09-26 | f
4 |  211 | 2011-12-14 | t
5 |  588 | 2016-12-15 | f
6 |   96 | 2012-02-19 | f
7 |   17 | 2018-09-18 | t
8 |  591 | 2018-02-15 | t
9 |  370 | 2015-07-28 | t
10 |  844 | 2019-05-16 | f

现在您必须获得每个名称和月份的计数,然后获得最大计数,但这不会给您具有最大值的名称,这需要与之前的结果连接起来。为了只做一次分组,它在CTE中完成:

WITH totals AS (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo 
WHERE request=true GROUP BY name,mon
)
SELECT * FROM 
(SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
JOIN totals USING (mon,cnt);

如果几个名字有相同的最大计数,它们将被同时返回。如果要只保留一个,可以使用distrit ON:

WITH (same as above)
SELECT DISTINCT ON (mon) * FROM
(SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
JOIN totals USING (mon,cnt) ORDER BY mon,name;

您还可以使用DISTINCT ON每月只保留一行,由ORDER子句指定,在本例中为count desc,因此它保留最高的计数。

SELECT DISTINCT ON (mon) * FROM (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo 
WHERE request=true GROUP BY name,mon
)x ORDER BY mon, cnt DESC;

…或者您可以通过将主键粘贴到传递给max()的数组中来破解argmax()函数,这意味着它将返回具有最大值的行id:

SELECT mon, cntid[1] cnt, name FROM
(SELECT mon, max(ARRAY[cnt,id]) cntid FROM (
SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt, min(id) id FROM foo
WHERE request=true GROUP BY name,mon
) x GROUP BY mon)y
JOIN foo ON (foo.id=cntid[2]);

哪个更快?…

给定表名为t01,列date为date1(字符串格式):

create temp table t02 as 
select  extract(month from CAST(date1 as date)) as month, person,  count(*) nb from t01 where request = 'discussed' group by 1, 2 ;
create temp table t03 as 
select month, max(nb) max_nb from t02 group by 1 ;

的结果是:

select month , person from t02 a natural join t03 b where a.nb = b.max_nb;

https://rextester.com/BYMM84335[: run here]1

我推荐distinct on。如果要将所有月份合并为单个"uber-month":

select distinct on (extract(month from date)) person, extract(month from date), count(*) as num_discussed
from t
where request = 'discussed'
group by person, extract(month from date)
order by extract(month from date), num_discussed desc;

Distinct on是一个非常方便的Postgres扩展。它按"组"的每行返回,"组"由括号中的表达式定义。该行是"第一行"。由order by条款决定。

如果你想要最高的月份,而不考虑年份:

select distinct on (extract(month from date)) person, date_trunc('month', date), count(*) as num_discussed
from t
where request = 'discussed'
group by person, date_trunc('month', date)
order by extract(month from date), num_discussed desc;

相关内容

  • 没有找到相关文章

最新更新