python，从 (1， n) 中选择随机 #k 数，不包括列表中的数字

对于给定的exclude_list = [3， 5， 8]， n = 30， k = 5

我想在 1 到 30 之间选择 5（k）个随机数。但我不应该在exclude_list中挑选数字

假设 exclude_list，n 可能很大。

当不需要排除时，很容易得到k个随机样本

rand_numbers = sample(range(1, n), k)

所以要得到答案，我可以做

sample(set(range(1, n)) - set(exclude_numbers), k)

我读到范围一次在内存中保留一个数字。我不太确定它如何影响上面的两行。

第一个问题是，下面的代码是将所有 n 个数字放在内存中还是一次放置每个数字？

rand_numbers = sample(range(1, n), k)

第二个问题是，如果上面的代码确实一次将一个数字放在内存中，我可以对排除列表的附加约束执行类似的操作吗？

sample 的文档字符串中的示例注释：

若要在整数范围内选择示例，请使用 range 作为参数。这对于从大总体：样本（范围（10000000）， 60）

我可以在我的机器上测试一下：

In [11]: sample(range(100000000), 3)
Out[11]: [70147105, 27647494, 41615897]
In [12]: list(range(100000000))  # crash/takes a long time

使用排除列表有效采样的一种方法是使用相同的范围技巧，但使用 bisect 模块"跳过"排除（我们可以在 O（k * log（ len(exclude_list) ）中执行此操作）：

import bisect
import random
def sample_excluding(n, k, excluding):
    # if we assume excluding is unique and sorted we can avoid the set usage...
    skips = [j - i for i, j in enumerate(sorted(set(excluding)))]
    s = random.sample(range(n - len(skips)), k)
    return [i + bisect.bisect_right(skips, i) for i in s]

我们可以看到它正在工作：

In [21]: sample_excluding(10, 3, [2, 4, 7])
Out[21]: [6, 3, 9]
In [22]: sample_excluding(10, 3, [1, 2, 8])
Out[22]: [0, 4, 3]
In [23]: sample_excluding(10, 6, [1, 2, 8])
Out[23]: [0, 7, 9, 6, 3, 5]

具体来说，我们在不使用 O（n）内存的情况下完成了此操作：

In [24]: sample_excluding(10000000, 6, [1, 2, 8])
Out[24]: [1495143, 270716, 9490477, 2570599, 8450517, 8283229]

相关内容

最新更新

热门标签：