如何根据不同的列选择整行

我在火花中这样做

cityId  PhysicalAddress      EmailAddress         ..many other columns of other meta info...   
1       b st                 something@email.com   
1       b st                 something@email.com   <- some rows can be entirely duplicates
1       a avenue             random@gmail.com
2       c square             anything@yahoo.com
2       d blvd               d@d.com

此表上没有主键，我想根据每个不同的 cityId 获取一行随机行

例如，这是一个正确的答案

cityId  PhysicalAddress      EmailAddress        ..many other columns 
1       b st                 something@email.com   
2       c square             anything@yahoo.com

例如，这也是一个正确的答案

cityId  PhysicalAddress      EmailAddress       ..many other columns 
1       a avenue             random@gmail.com
2       c square             anything@yahoo.com

想到的一种方法是使用group by.但是，这要求我在另一列上使用聚合函数。(如 min())。然而，我只想拉出一整行(不管哪一行)。

;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY cityId ORDER BY cityId) AS RN
FROM [TABLE_NAME]
) SELECT * FROM CTE WHERE RN = 1

我有 Sql Server 2008 R2，但试图找到适用于其他 DBMS 的方法。

create table contacts( cityId int, PhysicalAddress varchar(max), EmailAddress varchar(max) )
delete contacts
insert contacts( cityId, PhysicalAddress, EmailAddress )    /** ..many other columns of other meta info...  */  
values 
( 1, 'b st', 'something@email.com' )  
, ( 1, 'b st', 'something@email.com' )                     /* some rows can be entirely duplicates */
, ( 1, 'a avenue', 'random@gmail.com' )
, ( 2, 'c square', 'anything@yahoo.com' )
, ( 2, 'd blvd', 'd@d.com' )
, ( 3, 'e circuit', 'e@e.com' )
-- using row_number()
with c as (
select *, row_number() over (partition by cityId order by cityId) as seqnum
from contacts
)
select * from c
where seqnum = 1;

-- Add a new identity column
alter table contacts
add id int identity(1,1)
select * from contacts where id in (select min(id) from contacts group by cityID)
-- Variation: Create a copy into a temp table and add an identity column
-- Note: It may not be possible to modify original table
select * into #contacts from contacts
alter table #contacts
add id int identity(1,1)
select * from #contacts where id in (select min(id) from #contacts group by cityID)

我还尝试使用newid()使用计算列，但我的兴奋是短暂的，因为当您将表连接到自身或对该表使用子查询时，会为每个SELECT重新计算计算列，所以这不起作用。您不能使该计算列PERSISTED- 对于像newid()这样的非确定性表达式，这是不允许的，每次在给定行上调用它时都会返回不同的内容。

相关内容

最新更新

热门标签：