我在火花中这样做
cityId PhysicalAddress EmailAddress ..many other columns of other meta info...
1 b st something@email.com
1 b st something@email.com <- some rows can be entirely duplicates
1 a avenue random@gmail.com
2 c square anything@yahoo.com
2 d blvd d@d.com
此表上没有主键,我想根据每个不同的 cityId 获取一行随机行
例如,这是一个正确的答案
cityId PhysicalAddress EmailAddress ..many other columns
1 b st something@email.com
2 c square anything@yahoo.com
例如,这也是一个正确的答案
cityId PhysicalAddress EmailAddress ..many other columns
1 a avenue random@gmail.com
2 c square anything@yahoo.com
想到的一种方法是使用group by
.但是,这要求我在另一列上使用聚合函数。(如 min())。然而,我只想拉出一整行(不管哪一行)。
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY cityId ORDER BY cityId) AS RN
FROM [TABLE_NAME]
) SELECT * FROM CTE WHERE RN = 1
我有 Sql Server 2008 R2,但试图找到适用于其他 DBMS 的方法。
create table contacts( cityId int, PhysicalAddress varchar(max), EmailAddress varchar(max) )
delete contacts
insert contacts( cityId, PhysicalAddress, EmailAddress ) /** ..many other columns of other meta info... */
values
( 1, 'b st', 'something@email.com' )
, ( 1, 'b st', 'something@email.com' ) /* some rows can be entirely duplicates */
, ( 1, 'a avenue', 'random@gmail.com' )
, ( 2, 'c square', 'anything@yahoo.com' )
, ( 2, 'd blvd', 'd@d.com' )
, ( 3, 'e circuit', 'e@e.com' )
-- using row_number()
with c as (
select *, row_number() over (partition by cityId order by cityId) as seqnum
from contacts
)
select * from c
where seqnum = 1;
-- Add a new identity column
alter table contacts
add id int identity(1,1)
select * from contacts where id in (select min(id) from contacts group by cityID)
-- Variation: Create a copy into a temp table and add an identity column
-- Note: It may not be possible to modify original table
select * into #contacts from contacts
alter table #contacts
add id int identity(1,1)
select * from #contacts where id in (select min(id) from #contacts group by cityID)
我还尝试使用newid()
使用计算列,但我的兴奋是短暂的,因为当您将表连接到自身或对该表使用子查询时,会为每个SELECT
重新计算计算列,所以这不起作用。您不能使该计算列PERSISTED
- 对于像newid()
这样的非确定性表达式,这是不允许的,每次在给定行上调用它时都会返回不同的内容。