通过将两列分组为一对来获取最大id

我已经搜索了很多，以找到一个解决方案，使用数据集中两列的group by作为一对来获得最大id，但我找到并使用的查询都没有达到预期效果。以下是一个示例数据集：

舞台rndp2assoc2获胜者主绘图毛入学率<10673>>毛入学率105136联邦铁路局GER112074SWE>1070284毛入学率123980JPN毛入学率121582TPE毛入学率121582主绘图32TPE116620GER主绘图16TPE104314IND主绘图112092NGR>主绘图TPE112062联邦铁路局TPE107028GER>毛入学率100439毛入学率104314毛入学率1168534毛入学率123980JPN>主绘图毛入学率121514>TPE<106296>POR主绘图TPE102968121582主绘图GERGER118671DEN毛入学率102380101222<16>毛入学率116620总GER116853SWE1012224GER112074SWE101222毛入学率107028总入学率101222<12068>TUR毛入学率102891CRO毛入学率1043798毛入学率104036捷克102841主绘图<2>毛入学率101222主绘图>>主绘图>毛入学率AUT>毛入学率101222

id	tour_id	p1	assoc1
996057	5277	107028	32	107028
996101	5277	107028	主绘图	16
996126	5277	107028	主绘图	8
996133	5277	107028	主绘图
996139	5277	107028	主绘图	2
996037	5277	116620	主绘图	32
996037	5277	121582
996097	5277	121582
996121	5277	121582		TPE	121582
996132	5277	121582	4
996139	5277	121582	主绘图	2
996324	5278	107028	主绘图	32
996362	5278	107028	主绘图	16
996379	5278	107028	主绘图	8
996390	5278	107028	主绘图
996283	5278	1116620	64	KOR
996313	5278	121582	主绘图	32
996357	5278	121582	16	AUT
996380	5278	121582		TPE	102761
998765		5299	101222	主绘图	64	101222
998788		5299	101222	主绘图	32	ENG
998801	5299	101222	主绘图
998807		5299	101222	主绘图	8
998810	5299	101222	主绘图
998812	5299	101222	主绘图	2
998773		5299	107028		主绘图	64	GER	107028
99879		5299	107028		主绘图	32	107028
998805	5299	107028	主绘图	16
998809	5299	107028	主绘图
998811		5299	107028		主绘图	4	GER	POR	107028
998812	5299	107028
998757		5299	116620	64	GER	101192	ITA	116620
998794	5299	1116620	32	115449
998801	5299	1116620	主绘图	16

通常，我会使用一个简单的查询来获取条件的最大id，然后根据用例将其用作子查询或联接。看看这个小提琴：

https://dbfiddle.uk/K1wM0gEK

我插入了您的数据，然后插入了一系列查询。这是第一个，只是为了获得tour_id和p1的每个组合的maxID：

select tour_id, p1, max(id) as maxID 
from t group by tour_id, p1;

然后您可以在子查询中使用它来检索与这些ID匹配的任何行，如下所示：

select * from t
where id in (
select max(id)
from t group by tour_id, p1
);

或作为JOIN:

select t.* from t
join (
select max(id) as maxID
from t group by tour_id, p1
) ids on t.id = ids.maxID;

对于较大的数据集，JOIN通常比IN更具性能，但这不是一条硬性规定，而且这条线确实没有很好地定义。我把它放在这里只是为了参考。

现在，这些查询应该返回相同的结果，但您获取最大值的ID似乎不是唯一的ID，所以它们不是，这实际上取决于您试图实现的目标，即哪个答案是正确的。这里还有一个使用窗口函数的选项，这确实有些过头了，但让我们看看：

select tour_id, p1, 
first_value(id) OVER (partition by tour_id, p1 order by id desc) as maxID,
first_value(stage) OVER (partition by tour_id, p1 order by id desc) as stage,
first_value(rnd) OVER (partition by tour_id, p1 order by id desc) as rnd,
first_value(assoc1) OVER (partition by tour_id, p1 order by id desc) as assoc1,
first_value(p2) OVER (partition by tour_id, p1 order by id desc) as p2,
first_value(assoc2) OVER (partition by tour_id, p1 order by id desc) as assoc2,
first_value(winner) OVER (partition by tour_id, p1 order by id desc) as winner
from t

现在，这返回了更多的行，但其中很多都是重复的，所以让我们添加DISTINCT来获得uniques:

select DISTINCT tour_id, p1, 
first_value(id) OVER (partition by tour_id, p1 order by id desc) as maxID,
first_value(stage) OVER (partition by tour_id, p1 order by id desc) as stage,
first_value(rnd) OVER (partition by tour_id, p1 order by id desc) as rnd,
first_value(assoc1) OVER (partition by tour_id, p1 order by id desc) as assoc1,
first_value(p2) OVER (partition by tour_id, p1 order by id desc) as p2,
first_value(assoc2) OVER (partition by tour_id, p1 order by id desc) as assoc2,
first_value(winner) OVER (partition by tour_id, p1 order by id desc) as winner
from t

现在我们要做的是看起来更像你追求的东西。为了进行比较，我并排列出了三个查询，按id排序，列的顺序都相同：

select DISTINCT 
first_value(id) OVER (partition by tour_id, p1 order by id desc) as maxID,
tour_id, p1, 
first_value(stage) OVER (partition by tour_id, p1 order by id desc) as stage,
first_value(rnd) OVER (partition by tour_id, p1 order by id desc) as rnd,
first_value(assoc1) OVER (partition by tour_id, p1 order by id desc) as assoc1,
first_value(p2) OVER (partition by tour_id, p1 order by id desc) as p2,
first_value(assoc2) OVER (partition by tour_id, p1 order by id desc) as assoc2,
first_value(winner) OVER (partition by tour_id, p1 order by id desc) as winner
from t order by 1;

select * from t
where id in (
select max(id)
from t group by tour_id, p1
) order by id;
select t.* from t
join (
select max(id) as maxID
from t group by tour_id, p1
) ids on t.id = ids.maxID
order by t.id;

使用窗口函数的结果集似乎与您想要的输出相同，但让我说，对于如此简单的情况，窗口函数似乎有些过头了，所以我想知道您是否需要一些唯一的ID。如果表中没有唯一的主(自动递增)ID，则应该这样做。在以后的某个时候，它会帮你省去很多头疼的事。如果你这样做了，我想知道为什么我们不使用它而不是非唯一的。

如果这有帮助，或者有什么不清楚的地方，请告诉我。

相关内容

最新更新

热门标签：