由于生成的值序列中缺少实际模式,伪随机性变成了真正的随机性;因此,从本质上讲,重复自身的随机元素序列可能是无限的。
我知道random.pyseed()
s的设计方式是尽可能远离"伪"字符(即使用当前时间戳、机器参数等(,这在大多数情况下都很好,但如果需要从数学上确保零可预测性呢?
我读到过,当我们基于特定的物理事件(如放射性衰变(seed()
时,可以实现真正的随机性,但如果例如,我使用了从记录的音频流派生的数组呢?
以下是我如何为此目的覆盖默认random.seed()
行为的示例。我使用的是sounddevice
库,它实现了与负责管理I/O声音设备的服务的绑定。
# original random imports here
# ...
from sounddevice import rec
__all__ = ["module level functions here"]
# original random constants here
# ...
# sounddevice related constants
# ----------------------------------------------------------------------
# FS: Sampling Frequency in Hz (samples per second);
# DURATION: Duration of the recorded audio stream (seconds);
# *Note: changing the duration will result in a slower generator, since
# the seed method must wait for the entire stream to be recorded
# before processing further.
# CHANNELS: N° of audio channels used by the recording function (_rec);
# DTYPE: Data type of the np.ndarray returned by _rec;
# *Note: dtype can also be a np.dtype object. E.g., np.dtype("float64").
FS = 48000
DURATION = 0.1
CHANNELS = 2
DTYPE = 'float64'
# ----------------------------------------------------------------------
# The class implements a custom random generator with a seed obtained
# through the default audio input device.
# It's a subclass of random.Random that overrides only the seed method;
# it records an audio stream with the default parameters and returns the
# content in a newly created np.ndarray.
# Then the array's elements are added together and some transformations
# are performed on the sum, in order to obtain a less uniform float.
# This operation causes the randomness to concern the decimal part in
# particular, which is subject to high fluctuation, even when the noise
# of the surrounding environment is homogeneous over time.
# *Note: the blocking parameter suspends the execution until the entire
# stream is recorded, otherwise the np array will be partially empty.
# *Note: when the seed argument is specified and different than None,
# SDRandom will behave exactly like its superclass
class SDRandom(Random):
def seed(self, a=None, version=2):
if isinstance(a, type(None)):
stream = rec(frames=round(FS * DURATION),
samplerate=FS,
channels=CHANNELS,
dtype=DTYPE,
blocking=True
)
# Sum and Standard Deviation of the flattened ndarray.
sum_, std_ = stream.sum(), stream.std()
# round() determines the result's sign.
b = sum_ - round(sum_)
# Collecting a number of exponents based on the std' digits.
e = [1 if int(c) % 2 else -1 for c in str(std_).strip("0.")]
a = b * 10 ** sum(e)
super().seed(a)
# ----------------------------------------------------------------------
# Create one instance, seeded from an audio stream, and export its
# methods as module-level functions.
# The functions share state across all uses.
_inst = SDRandom()
# binding class methods to module level functions here
# ...
## ------------------------------------------------------
## ------------------ fork support ---------------------
if hasattr(_os, "fork"):
_os.register_at_fork(after_in_child=_inst.seed)
if __name__ == '__main__':
_test() # See random._test() definition.
根据理论,我的实现仍然没有实现真正的随机性。这怎么可能?音频输入在任何方面都是确定性的,即使在考虑以下因素时也是如此?
此操作导致随机性与中的小数部分有关特别是,即使在噪声随着时间的推移,周围环境的变化是均匀的。
您最好只使用secrets
模块;真实的";随机性。这为您提供了来自内核的CSPRNG的数据,它应该不断地收集和混合新的熵,以使任何攻击者都很难生存。
你对infinite的使用也不合适,你不能为";"无限长";在那之前,宇宙的热死亡将发生很长一段时间。
使用标准的Mersenne Twister(就像Python的random
模块一样(似乎也不合适,因为攻击者只需绘制624个变量就可以恢复状态。使用CSPRNG会让这变得更加困难,并且不断地在新的状态下混合,就像内核可能做的那样,会进一步强化这一点。
最后,将样本视为浮点数,然后取平均值和标准差似乎不合适。你最好把它们作为int,然后通过一个加密散列来传递。例如:
import hashlib
import random
import sounddevice as sd
samples = sd.rec(
frames=1024,
samplerate=48000,
channels=2,
dtype='int32',
blocking=True,
)
rv = int.from_bytes(hashlib.sha256(samples).digest(), 'little')
print(rv)
random.seed(rv)
print(random.random())
但话说回来,请只使用secrets
,这是一个更好的选择。
注意:Linux、Windows、OSX、FreeBSD、OpenBSD内核的最新版本都可以像我上面描述的那样工作。它们在收集熵方面做了很好的尝试,并以合理的方式混合到CSPRNG中;例如,参见Fortuna。