这是我想要的函数
random_select(contain_list, ttl_num, sample_num)
从0
到ttl_num-1
有ttl_num
个整数可供选择,我想返回一个sample_num
唯一整数的列表,其中contain_list
中提供的数字必须在列表中,其他数字是随机选择的。
我必须经常执行此查询,每次使用不同的contain_list
,但是ttl_num
,所有查询sample_num
都相同。
目前我正在做的是,首先生成一组ttl_num
整数,从集合中减去contain_list
,随机选择一些没有替换的数字,然后将其与contain_list
连接得到结果。
我相信这不是最快的方法,有什么更好的想法吗?
如果需要,可以使用全局变量。
编辑:sample_num
长度不小于contain_list
,我想得到contain_list
加上sample_num - contain_list.length
其他随机数
,可以保证contain_list
中的数字都在0
到ttl_num-1
的范围内。
这里有几种可能性。 两者都不比您已有的复杂,但其中一个或两个可能会更快,具体取决于参数值的大小。 只有用你的实际数据进行基准测试才能确定。
方法 1
这里的逻辑基本上与你已经在做的事情相同。 它只是用一个整数数组替换了集合的生成和操作,这应该更轻量级。 但是,它确实需要对contain_list
进行排序(降序),因此它实际上是否比您已经拥有的运行速度更快可能取决于contain_list.count
和ttl_num
的大小。
1) initialize a tracking var, remaining_num = ttl_num
2) initialize an integer array with value = index
3) sort contain_list descending
4) iterate through contain_list (now in descending order); for each:
4.1) decrement remaining_num
4.2) swap the element at the selected index with the one at index = remaining_num
5) iterate (sample_num - contain_list.count) times; for each:
5.1) generate a random index between 0 and remaining_num (inclusive and exclusive, respectively)
5.2) decrement remaining_num
5.3) swap the element at the selected index with the one at index = remaining_num
6) The resultant samples will start at index reamining_num and run through the end of the array.
下面是 random_select({3, 7}, 10, 5)...
remaining_num = 10
available_num[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
contain_list = {7, 3}
select the 7
remaining_num = 9
available_num[] = {0, 1, 2, 3, 4, 5, 6, 9, 8, 7}
select the 3
remaining_num = 8
available_num[] = {0, 1, 2, 8, 4, 5, 6, 9, 3, 7}
select a random(0,8), e.g. 2
remaining_num = 7
available_num[] = {0, 1, 9, 8, 4, 5, 6, 2, 3, 7}
select a random(0,7), e.g. 3
remaining_num = 6
available_num[] = {0, 1, 9, 6, 4, 5, 8, 2, 3, 7}
select a random(0,6), e.g. 0
remaining_num = 5
available_num[] = {5, 1, 9, 6, 4, 0, 8, 2, 3, 7}
result = {0, 8, 2, 3, 7}
方法 2
如果ttl_num
足够大,sample_num
足够低,那么可能值得把事情颠倒过来。 也就是说,与其创建和操作一组可用号码,不如仅跟踪所选号码的列表。 然后,在选择每个随机目标时,通过遍历所选数字列表并计算小于或等于目标索引的方式来"跳过"先前选择的数字。
1) initialize a tracking var, remaining_num = ttl_num - contain_list.count
2) declare an empty list (vector) of integers, selected_num[]
4) iterate through contain_list; for each:
4.1) insert cointain_list[i] into selected_num[]
5) iterate (sample_num - contain_list.count) times; for each:
5.1) generate a random target between 0 and remaining_num (inclusive and exclusive, respectively)
5.2) decrement remaining_num
5.3) iterate through selected_num; for each:
5.3.1) if target >= selected_list[j], increment target
5.4) insert target into selected_num[]
6) The resultant samples will be all elements in selected_num.
下面是 random_select({3, 7}, 10, 5)...
remaining_num = 8
selected_num[] = {}
select the 3
selected_num[] = {3}
select the 7
selected_num[] = {3, 7}
select a random(0,8), e.g. target = 2
remaining_num = 7
2 < 3; target still 2
2 < 7; target still 2
selected_num[] = {3, 7, 2}
select a random(0,7), e.g. target = 3
remaining_num = 6
3 >= 3; target becomes 4
4 < 7; target still 4
4 >= 2; target becomes 5
selected_num[] = {3, 7, 2, 5}
select a random(0,6), e.g. target = 0
remaining_num = 5
0 < 3; target still 0
0 < 7; target still 0
0 < 2; target still 0
0 < 5; target still 0
selected_num[] = {3, 7, 2, 5, 0}
显然,如果sample_num
很大,在选择每个新数字时遍历selected_num[]
可能会变得昂贵。 通过保持selected_num[]
降序排序并在看到小于目标的数字时立即中断内部循环,可以在一定程度上缓解此问题。 在列表中的该点插入目标以保持排序。
我只是使用 numpy 以矢量化的方式从 James Droscha 的答案中编写了一些类似于方法 1 的代码,结果证明只有几行代码,
def random_select(batch, ttl_num, sample_num):
# add the following line if elements in batch are not guaranteed to be unique
# batch = np.unique(batch)
batch_size = len(batch)
# step 1
candidates = np.arange(ttl_num)
# step 4
candidates[batch] = candidates[-batch_size:] # so that elements in candidates[:ttl_num-batch_size] are not contained in batch
# step 5
idx = np.random.choice(ttl_num-batch_size, sample_num-batch_size, replace=False)
return np.concatenate([candidates[idx], batch])