在Scipy中，为什么具有统一概率的custom.rvs()只返回起始区域的值

如果我生成一个数组

custom=np.ones(800, dtype=np.float32)

然后使用创建自定义概率分布

custom=normalize(custom)[0]
customPDF = stats.rv_discrete(name='pdfX', values=(np.arange(800), custom))

那么如果我使用

customPDF.rvs()

我得到的返回值在0-20之间，而我期望的是0到800之间的随机数。

下面的代码给了我所需的输出，

random.uniform(0,800)

但是，由于必须能够通过更改自定义数组来操纵概率分布，我不得不使用customPDF.rvs（）

有解决方案吗？或者为什么会发生这种情况？？

In [206]: custom=np.ones(800, dtype=np.float32)
In [207]: custom=normalize(custom)[0]
/usr/local/lib/python3.4/dist-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
In [208]: customPDF = stats.rv_discrete(name='pdfX', values=(np.arange(800), custom))
In [209]: customPDF.rvs()
Out[209]: 7
In [210]: customPDF.rvs()
Out[210]: 13
In [211]: customPDF.rvs()
Out[211]: 15
In [212]: customPDF.rvs()
Out[212]: 3
In [213]: customPDF.rvs()
Out[213]: 8
In [214]: customPDF.rvs()
Out[214]: 10
In [215]: customPDF.rvs()
Out[215]: 10
In [216]: customPDF.rvs()
Out[216]: 11
In [217]: customPDF.rvs()
Out[217]: 15
In [218]: customPDF.rvs()
Out[218]: 6
In [219]: customPDF.rvs()
Out[219]: 7
In [220]: random.uniform(0,800)
Out[220]: 707.0265562968543

问题是这一行：

custom=normalize(custom)[0]

根据警告，看起来normalize指的是sklearn.preprocessing.normalize。normalize需要一个[n_samples, n_features] 2D数组-因为你给它一个1D向量，它会插入一个新的维度，并将其视为[1, n_features]数组（因此你要索引输出的第0个元素）。

默认情况下，它会将每行特征的L2（欧几里得）范数调整为等于1。这是而不是与使元素总和为1:相同

print(normalize(np.ones(800))[0].sum())
# 28.2843

由于custom的和远大于1，因此在到达概率向量的末尾之前，绘制特定整数的累积概率达到1：

print(custom.cumsum().searchsorted(1))
# 28

结果是，你永远不会画一个大于28:的整数

print(customPDF.rvs(size=100000).max())
# 28

为了规范化custom，你应该用它的和除以：

custom /= custom.sum()
# or alternatively:
custom = np.repeat(1./800, 800)

相关内容

最新更新

热门标签：