与Python3计算的c++中的随机数相同numpy.random.rand



我想在c++中复制一些已经在Python3中实现的代码的测试,这些代码依赖于numpy.random.randrandn值以及特定的种子(例如,seed = 1)。

我知道Python的随机实现是基于Mersenne twister。c++标准库在std::mersenne_twister_engine中也提供了这个功能。

c++版本返回一个unsigned int,而Python的rand是一个浮点值。

是否有一种方法可以在c++中获得与Python中生成的相同的值,并确保它们是相同的?对于由randn生成的数组也是如此?

对于整数值可以这样做:

import numpy as np
np.random.seed(12345)
print(np.random.randint(256**4, dtype='<u4', size=1)[0])
#include <iostream>
#include <random>
int main()
{
std::mt19937 e2(12345);
std::cout << e2() << std::endl;
}

两个片段的结果是3992670690


通过查看rand的源代码,您可以在c++代码中实现它:

import numpy as np
np.random.seed(12345)
print(np.random.rand())
#include <iostream>
#include <iomanip>
#include <random>
int main()
{
std::mt19937 e2(12345);
int a = e2() >> 5;
int b = e2() >> 6;
double value = (a * 67108864.0 + b) / 9007199254740992.0;
std::cout << std::fixed << std::setprecision(16) << value << std::endl;
}

两个随机值均为0.9296160928171479


使用std::generate_canonical比较方便,但是使用另一种方法将梅氏捻线机的输出转换成double。它们不同的原因可能是generate_canonical比NumPy中使用的随机生成器更优化,因为它避免了昂贵的浮点操作,特别是乘法和除法,如源代码所示。然而,它似乎依赖于实现,而NumPy在所有平台上产生相同的结果。

double value = std::generate_canonical<double, std::numeric_limits<double>::digits>(e2);

这不起作用,产生的结果是0.8901547132827379,这与Python代码的输出不同。

为完整起见,并避免重新发明轮子,这里是两个numpy的实现。兰德和傻瓜。c++中的随机数

头文件:

#ifndef RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#define RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#include "RandomNumGenerator.h"

//Uniform distribution - numpy.rand
class RandomNumGen_NumpyCompatible {
public:
RandomNumGen_NumpyCompatible();
RandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);
std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
void seed(std::uint_fast32_t seed);
void discard(unsigned long long);      // NOTE!!  Advances and discards twice as many values as passed in to keep tracking with Numpy order
uint_fast32_t operator()();            //Simply returns the next Mersenne value from the engine
double getDouble();                    //Calculates the next uniformly random double as numpy.rand does
std::string getGeneratorType() const { return "RandomNumGen_NumpyCompatible"; }
private:
std::mt19937 m_mersenneEngine;
};
///////////////////
//Gaussian distribution - numpy.randn
class GaussianRandomNumGen_NumpyCompatible {
public:
GaussianRandomNumGen_NumpyCompatible();
GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);
std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
void seed(std::uint_fast32_t seed);
void discard(unsigned long long);      // NOTE!!  Advances and discards twice as many values as passed in to keep tracking with Numpy order
uint_fast32_t operator()();            //Simply returns the next Mersenne value from the engine
double getDouble();                    //Calculates the next normally (Gaussian) distrubuted random double as numpy.randn does
std::string getGeneratorType() const { return "GaussianRandomNumGen_NumpyCompatible"; }
private:
bool m_haveNextVal;
double m_nextVal;
std::mt19937 m_mersenneEngine;
};
#endif

和实现:

#include "RandomNumGen_NumpyCompatible.h"
RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible()
{
}
RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_mersenneEngine(seed)
{
}
void RandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
m_mersenneEngine.seed(newSeed);
}
void RandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
//Advances and discards TWICE as many values to keep with Numpy order
m_mersenneEngine.discard(2*z);
}
std::uint_fast32_t RandomNumGen_NumpyCompatible::operator()()
{
return m_mersenneEngine();
}
double RandomNumGen_NumpyCompatible::getDouble()
{
int a = m_mersenneEngine() >> 5;
int b = m_mersenneEngine() >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
///////////////////
GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible()
: m_haveNextVal(false)
{
}
GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_haveNextVal(false), m_mersenneEngine(seed)
{
}
void GaussianRandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
m_mersenneEngine.seed(newSeed);
}
void GaussianRandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
//Burn some CPU cyles here
for (unsigned i = 0; i < z; ++i)
getDouble();
}
std::uint_fast32_t GaussianRandomNumGen_NumpyCompatible::operator()()
{
return m_mersenneEngine();
}
double GaussianRandomNumGen_NumpyCompatible::getDouble()
{
if (m_haveNextVal) {
m_haveNextVal = false;
return m_nextVal;
}
double f, x1, x2, r2;
do {
int a1 = m_mersenneEngine() >> 5;
int b1 = m_mersenneEngine() >> 6;
int a2 = m_mersenneEngine() >> 5;
int b2 = m_mersenneEngine() >> 6;
x1 = 2.0 * ((a1 * 67108864.0 + b1) / 9007199254740992.0) - 1.0;
x2 = 2.0 * ((a2 * 67108864.0 + b2) / 9007199254740992.0) - 1.0;
r2 = x1 * x1 + x2 * x2;
} while (r2 >= 1.0 || r2 == 0.0);
/* Box-Muller transform */
f = sqrt(-2.0 * log(r2) / r2);
m_haveNextVal = true;
m_nextVal = f * x1;
return f * x2;
}

经过一些测试后,当c++ unsigned int除以unsigned int的最大值时,这些值似乎在一个公差范围内(参见下面的@fdermishin的评论):

#include <limits>
...
std::mt19937 generator1(seed);  // mt19937 is a standard mersenne_twister_engine
unsigned val1 = generator1();
std::cout << "Gen 1 random value: " << val1 << std::endl;
std::cout << "Normalized Gen 1: " << static_cast<double>(val1) /  std::numeric_limits<std::uint32_t>::max() << std::endl;

然而,Python的版本似乎跳过了所有其他值。给定以下两个程序:

#!/usr/bin/env python3
import numpy as np
def main():
np.random.seed(1)

for i in range(0, 10):
print(np.random.rand())
###########
# Call main and exit success
if __name__ == "__main__":
main()
sys.exit()

#include <cstdlib>
#include <iostream>
#include <random>
#include <limits>
int main()
{
unsigned seed = 1;
std::mt19937 generator1(seed);  // mt19937 is a standard mersenne_twister_engine
for (unsigned i = 0; i < 10; ++i) {
unsigned val1 = generator1();
std::cout << "Normalized, #" << i << ": " << (static_cast<double>(val1) / std::numeric_limits<std::uint32_t>::max()) << std::endl;
}
return EXIT_SUCCESS;
}

Python程序打印:

0.417022004702574
0.7203244934421581
0.00011437481734488664
0.30233257263183977
0.14675589081711304
0.0923385947687978
0.1862602113776709
0.34556072704304774
0.39676747423066994
0.538816734003357

而c++程序输出:

Normalized, #0: 0.417022
Normalized, #1: 0.997185
Normalized, #2: 0.720324
Normalized, #3: 0.932557
Normalized, #4: 0.000114381
Normalized, #5: 0.128124
Normalized, #6: 0.302333
Normalized, #7: 0.999041
Normalized, #8: 0.146756
Normalized, #9: 0.236089

我可以很容易地跳过c++版本中的所有其他值,这应该会给我与Python版本匹配的数字(在一个容忍范围内)。但是为什么Python的实现似乎会跳过所有其他值,或者c++版本中这些额外的值是从哪里来的?

最新更新