我有一个由calculate_table()
过程生成的下表:
table(
id integer,
type integer
)
我希望做一个分层随机样本,我选择一个随机 id,按类型随机化,并返回类型的计数和 id 的计数。
所以在下面的示例中:
id,type
1,1
2,1
3,1
4,1
5,2
6,2
7,2
8,3
9,4
随机化可以选择以下内容:
chosen_type: 2
-- how many unique types are there overall
type_count: 4
chosen_id: 6
-- of the type matching chosen_id, how many
-- instances are there.
id_count: 3
因此,获得类型 2 的几率为 25%,如果选择类型 2,则有 33% 的机会获得 ID 6。
以下内容不起作用,因为它是从所有id中随机选择的,与它们的类型无关,这不是我想要的。
select * from calculate_table()
order by random()
limit 1;
我在尝试避免多次调用calculate_table()
过程和/或将内容存储在数组中时遇到麻烦。我该怎么做?
with t(id,type) as (values
(1,1),(2,1),(3,1),(4,1),(5,2),(6,2),(7,2),(8,3),(9,4)
), dt as (
select type, id
from t
group by 1,2
order by random()
limit 1
)
select
type as chosen_type,
(select count(distinct type) from t) as type_count,
id as chosen_id,
(select count(distinct id) from t where type = dt.type) as id_count
from dt;
chosen_type | type_count | chosen_id | id_count
-------------+------------+-----------+----------
2 | 4 | 6 | 3
你可以使用窗口函数来实现这一点,按 random() 排序。
有关示例,请参阅此 SQLfiddle。
select *
from (
select type,
row_number() over( order by random() ) as type_random,
id,
row_number() over( partition by type order by random() ) as id_random
from calculate_table()
) as a
where type_random = 1
and id_random = 1