正确设置随机种子以提高可重复性



使用 Fortran 90 子例程random_seed设置随机种子的方法非常简单。

call random_seed( put=seed )

但是我找不到有关设置种子的指南的任何信息(当您想要可重复性时,这是绝对必要的(。 我过去听到的民间传说建议标量种子应该很大。 例如,123456789是比123更好的种子。 我能在网上找到的唯一支持是建议使用 ifort 扩展函数ran()使用"大的奇数整数值">

我知道这可能是特定于实现的,并且正在使用 gfortran 4.8.5,但也对独立于实现的 ifort 和(如果可能(一般准则感兴趣。 下面是一些示例代码:

# for compactness, assume seed size of 4, but it will depend on 
# the implementation (e.g. for my version of gfortran 4.8.5 it is 12)
seed1(1:4) = [ 123456789, 987654321, 456789123, 7891234567 ]
seed2(1:4) =   123456789
seed3(1:4) = [         1,         2,         3,          4 ]

我猜seed1很好,但如果你手动设置它(像我一样(,因为种子长度可以是 12 或 33 或其他什么,那就很冗长了。 我甚至不确定这是否合适,因为我根本找不到任何关于设置这些种子的指导方针。 即这些种子对于我所知应该是负数,或者 3 位偶数等,尽管我猜您希望实现会警告您(?

seed2seed3显然设置起来更方便,据我所知,它们都一样好。 @Ross表明seed2在这里的答案实际上很好:返回种子更新值的随机数生成器 (RNG/PRNG(

所以我总结的问题只是:如何正确设置种子? 是否可以接受任何或所有seed1seed3

设置种子的准则取决于RANDOM_NUMBER使用的 PRNG 算法,但一般来说,您提供的"熵"越多越好。

如果您有单个标量值,则可以使用一些简单的 PRNG 将其扩展到RANDOM_SEED所需的完整种子数组。例如,请参阅示例代码中的函数lcghttps://gcc.gnu.org/onlinedocs/gcc-4.9.1/gfortran/RANDOM_005fSEED.html

当前版本的 GFortran 对不良种子有一些保护,它应该相对免疫"哑"种子(例如,所有seed(:)相同的值,或者所有值都很小甚至为零(,但对于可移植性到其他编译器遵循我上面建议的东西可能仍然是一个好主意。

您提供给random_seed( put=... )的内容用于确定生成器的起始状态,该状态(如 janneb 所述(应具有尽可能多的熵。你可以构建一些相对复杂的方法来生成这个熵 - 以某种方式从系统中抓取是一个不错的选择。代码 janneb 链接就是一个很好的例子。

但是,我通常希望能够在必要时从给定种子重现单次运行。这对于调试和回归测试很有用。然后,对于生产运行,代码可以以某种方式"随机"拉取单个种子。因此,我想从单个"种子"中获得好的RNG。根据我的经验,这很容易通过提供这个种子,然后让生成器通过生成数字来增加熵。请考虑以下示例:

program main
implicit none
integer, parameter :: wp = selected_real_kind(15,307)
integer, parameter :: n_discard = 100
integer :: state_size, i
integer, allocatable, dimension(:) :: state
real(wp) :: ran, oldran
call random_seed( size=state_size )
write(*,*) '-- state size is: ', state_size
allocate(state(state_size))
! -- Simple method of initializing seed from single scalar
state = 20180815
call random_seed( put=state )
! -- 'Prime' the generator by pulling the first few numbers
! -- In reality, these would be discarded but I will print them for demonstration
ran = 0.5_wp
do i=1,n_discard
oldran = ran
call random_number(ran)
write(*,'(a,i3,2es26.18)') 'iter, ran, diff: ', i, ran, ran-oldran
enddo
! Now the RNG is 'ready'
end program main

在这里,我给出一个种子,然后生成一个随机数 100 次。通常,我会丢弃这些可能损坏的初始数字。在此示例中,我打印它们以查看它们是否看起来是非随机的。使用 PGI 15.10 运行:

enet-mach5% pgfortran --version
pgfortran 15.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2015, NVIDIA CORPORATION.  All rights reserved.
enet-mach5% pgfortran main.f90 && ./a.out
-- state size is:            34
iter, ran, diff:   1  8.114813341476008191E-01  3.114813341476008191E-01
iter, ran, diff:   2  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   3  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   4  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   5  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   6  2.172220012214012286E-01 -5.942593329261995905E-01
iter, ran, diff:   7  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   8  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   9  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  10  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  11  6.229626682952016381E-01  4.057406670738004095E-01
iter, ran, diff:  12  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  13  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  14  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  15  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  16  2.870333536900204763E-02 -5.942593329261995905E-01
iter, ran, diff:  17  2.870333536900204763E-02  0.000000000000000000E+00
iter, ran, diff:  18  4.344440024428024572E-01  4.057406670738004095E-01
iter, ran, diff:  19  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  20  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  21  8.401846695166028667E-01  4.057406670738004095E-01
iter, ran, diff:  22  8.401846695166028667E-01  0.000000000000000000E+00
iter, ran, diff:  23  6.516660036642036857E-01 -1.885186658523991809E-01
iter, ran, diff:  24  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  25  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  26  5.740667073800409526E-02 -5.942593329261995905E-01
iter, ran, diff:  27  5.740667073800409526E-02  0.000000000000000000E+00
iter, ran, diff:  28  2.746286719594053238E-01  2.172220012214012286E-01
iter, ran, diff:  29  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  30  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  31  6.803693390332057334E-01  4.057406670738004095E-01
iter, ran, diff:  32  6.803693390332057334E-01  0.000000000000000000E+00
iter, ran, diff:  33  3.033320073284073715E-01 -3.770373317047983619E-01
iter, ran, diff:  34  3.033320073284073715E-01  0.000000000000000000E+00
iter, ran, diff:  35  7.090726744022077810E-01  4.057406670738004095E-01
iter, ran, diff:  36  1.148133414760081905E-01 -5.942593329261995905E-01
iter, ran, diff:  37  1.148133414760081905E-01  0.000000000000000000E+00
iter, ran, diff:  38  1.435166768450102381E-01  2.870333536900204763E-02
iter, ran, diff:  39  1.435166768450102381E-01  0.000000000000000000E+00
iter, ran, diff:  40  3.607386780664114667E-01  2.172220012214012286E-01
iter, ran, diff:  41  7.664793451402118762E-01  4.057406670738004095E-01
iter, ran, diff:  42  7.664793451402118762E-01  0.000000000000000000E+00
iter, ran, diff:  43  2.009233475830143334E-01 -5.655559975571975428E-01
iter, ran, diff:  44  2.009233475830143334E-01  0.000000000000000000E+00
iter, ran, diff:  45  6.353673500258167905E-01  4.344440024428024572E-01
iter, ran, diff:  46  4.110801709961720007E-02 -5.942593329261995905E-01
iter, ran, diff:  47  4.110801709961720007E-02  0.000000000000000000E+00
iter, ran, diff:  48  8.812926866162200668E-01  8.401846695166028667E-01
iter, ran, diff:  49  8.812926866162200668E-01  0.000000000000000000E+00
iter, ran, diff:  50  9.386993573542241620E-01  5.740667073800409526E-02
iter, ran, diff:  51  3.444400244280245715E-01 -5.942593329261995905E-01
iter, ran, diff:  52  7.501806915018249811E-01  4.057406670738004095E-01
iter, ran, diff:  53  9.961060280922282573E-01  2.459253365904032762E-01
iter, ran, diff:  54  9.961060280922282573E-01  0.000000000000000000E+00
iter, ran, diff:  55  8.221603419923440015E-02 -9.138899938929938571E-01
iter, ran, diff:  56  4.879567012730348097E-01  4.057406670738004095E-01
iter, ran, diff:  57  1.109193695682364478E-01 -3.770373317047983619E-01
iter, ran, diff:  58  7.625853732324401335E-01  6.516660036642036857E-01
iter, ran, diff:  59  7.625853732324401335E-01  0.000000000000000000E+00
iter, ran, diff:  60  2.831393817822487335E-01 -4.794459914501914000E-01
iter, ran, diff:  61  6.888800488560491431E-01  4.057406670738004095E-01
iter, ran, diff:  62  7.462867195940532383E-01  5.740667073800409526E-02
iter, ran, diff:  63  8.036933903320573336E-01  5.740667073800409526E-02
iter, ran, diff:  64  8.036933903320573336E-01  0.000000000000000000E+00
iter, ran, diff:  65  1.644320683984688003E-01 -6.392613219335885333E-01
iter, ran, diff:  66  5.701727354722692098E-01  4.057406670738004095E-01
iter, ran, diff:  67  6.849860769482774003E-01  1.148133414760081905E-01
iter, ran, diff:  68  1.481334147600819051E-01 -5.368526621881954952E-01
iter, ran, diff:  69  5.538740818338823146E-01  4.057406670738004095E-01
iter, ran, diff:  70  1.605380964906970576E-01 -3.933359853431852571E-01
iter, ran, diff:  71  5.662787635644974671E-01  4.057406670738004095E-01
iter, ran, diff:  72  7.672021111475118005E-01  2.009233475830143334E-01
iter, ran, diff:  73  6.360901160331167148E-01 -1.311119951143950857E-01
iter, ran, diff:  74  6.647934514021187624E-01  2.870333536900204763E-02
iter, ran, diff:  75  9.231234697231371911E-01  2.583300183210184287E-01
iter, ran, diff:  76  3.288641367969376006E-01 -5.942593329261995905E-01
iter, ran, diff:  77  5.034149292976053403E-02 -2.785226438671770666E-01
iter, ran, diff:  78  3.249701648891658579E-01  2.746286719594053238E-01
iter, ran, diff:  79  4.110801709961720007E-01  8.611000610700614288E-02
iter, ran, diff:  80  7.268168600551945246E-01  3.157366890590225239E-01
iter, ran, diff:  81  1.325575271289949342E-01 -5.942593329261995905E-01
iter, ran, diff:  82  2.147735613282293343E-01  8.221603419923440015E-02
iter, ran, diff:  83  8.951429003614350677E-01  6.803693390332057334E-01
iter, ran, diff:  84  9.606624794444940107E-02 -7.990766524169856666E-01
iter, ran, diff:  85  8.749502748152764298E-01  7.788840268708270287E-01
iter, ran, diff:  86  6.864316089628772488E-01 -1.885186658523991809E-01
iter, ran, diff:  87  3.753116578189263919E-01 -3.111199511439508569E-01
iter, ran, diff:  88  4.614216639259325348E-01  8.611000610700614288E-02
iter, ran, diff:  89  8.632683590919612016E-01  4.018466951660286668E-01
iter, ran, diff:  90  5.110403908483931446E-01 -3.522279682435680570E-01
iter, ran, diff:  91  3.512250603649960112E-01 -1.598153304833971333E-01
iter, ran, diff:  92  2.984351275420635830E-01 -5.278993282293242828E-02
iter, ran, diff:  93  7.902858007228701354E-01  4.918506731808065524E-01
iter, ran, diff:  94  9.136098520217217356E-01  1.233240512988516002E-01
iter, ran, diff:  95  8.360105557375590024E-01 -7.759929628416273317E-02
iter, ran, diff:  96  7.623052313611680120E-01 -7.370532437639099044E-02
iter, ran, diff:  97  2.525198759725810760E-02 -7.370532437639099044E-01
iter, ran, diff:  98  9.228433278518650695E-01  8.975913402546069619E-01
iter, ran, diff:  99  1.283834133499510699E-01 -7.944599145019139996E-01
iter, ran, diff: 100  7.311534560989940701E-01  6.027700427490430002E-01

生成的前 10 个数字中有 8 个是重复的!这很好地说明了为什么有些生成器首先需要高熵状态。然而,在"一段时间"之后,这些数字开始看起来很合理。

对于我的应用程序,100 个左右的随机数是一个非常小的成本,所以每当我播种生成器时,我都会以这种方式启动它们。我没有在 ifort 16.0、gfortran 4.8 或 gfortran 8.1 上观察到这种明显的不良行为。不过,非重复数字是一个相当低的标准。因此,我会为所有编译器做好准备,而不仅仅是那些我观察到不良行为的编译器。

从注释中,一些编译器试图通过以某种方式处理输入状态来产生实际的内部状态来消除不良行为。Gfortran使用"异或密码"。操作在get上反转。

最新更新