改变视图的dtype会改变numpy函数的结果,但不会改变数组的值



我想知道np的值。均值函数大致相同,前后的dtype变化。dtype必须保持float32。

array = np.random.randint(0, high=255, size=(3, 12000, 12000),dtype="int")
array = array[:,500:10000,500:10000]
array= array.reshape((-1,3))
# array.shape is now (90250000, 3)
print(np.mean(array,axis=0),array.dtype) # Nr.1
array = array.astype("float32")
print(np.mean(array,axis=0),array.dtype) # Nr.2

两个print函数的结果:

[127.003107   127.00156286 126.99015613] int32
[47.589664 47.589664 47.589664] float32

在视图行中添加.copy()无效。视图的大小影响对浮点平均值的影响。将最后两个维度的大小都更改为[500:8000]的结果是:

[76.35497 76.35497 76.35497] float32

[500:5000]附近及以下,两者的均值实际上大致相同。

从重塑行开始更改代码:

array= array.reshape((-1,3))
array_float = array.astype("float32")
print(np.all(array_float==array),array.dtype,array_float.dtype)

结果:

True int32 float32

所以如果值是相同的,为什么np的结果。

你的数组:

In [50]: arr.shape, arr.dtype
Out[50]: ((90250000, 3), dtype('int32'))

你可以用np.random.randint(0, high=255, size=(90250000,3),dtype="int")得到这个。事实上,我们不需要这个尺寸为3的维度。无论如何,它只是(0,255)范围内的许多数字。

期望平均值:

In [51]: np.mean(arr, axis=0)
Out[51]: array([126.9822936 , 126.99682718, 126.99214526])

但是注意如果我们把这些数字加起来会得到什么:

In [52]: np.sum(arr, axis=0)
Out[52]: array([-1424749891, -1423438235, -1423860778])

int32和为溢出和缠绕。数字太多了。所以mean一定在做一些比简单的求和和除以计数更复杂的事情。

float32的平均值,得到有趣的值:

In [53]: np.mean(arr.astype('float32'), axis=0)
Out[53]: array([47.589664, 47.589664, 47.589664], dtype=float32)

butfloat64匹配int类型(但转换时间较长):

In [54]: np.mean(arr.astype('float64'), axis=0)
Out[54]: array([126.9822936 , 126.99682718, 126.99214526])

看起来浮点数mean只是在做求和除法:

In [56]: np.sum(arr.astype('float64'), axis=0)
Out[56]: array([1.14601520e+10, 1.14614637e+10, 1.14610411e+10])    
In [57]: np.sum(arr.astype('float32'), axis=0)
Out[57]: array([4.2949673e+09, 4.2949673e+09, 4.2949673e+09], dtype=float32)
In [58]: Out[56]/arr.shape[0]
Out[58]: array([126.9822936 , 126.99682718, 126.99214526])    
In [59]: Out[57]/arr.shape[0]
Out[59]: array([47.58966533, 47.58966533, 47.58966533])

当和在float32:

范围内时
In [60]: np.finfo('float32')
Out[60]: finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)

由于某些原因,它无法获得正确的值。

请注意pythonsum有int版本的问题:

In [70]: sum(arr[:,0])
C:UserspaulAppDataLocalTempipykernel_11281456076714.py:1: RuntimeWarning: overflow encountered in long_scalars
sum(arr[:,0])
Out[70]: -1424749891

有一个math.fsum可以更好地处理大量的数据:

In [71]: math.fsum(arr[:,0])
Out[71]: 11460151997.0

对长整型浮点数求和也可以:

In [72]: np.sum(arr.astype('int64'),axis=0)
Out[72]: array([11460151997, 11461463653, 11461041110], dtype=int64)

Fromnp.meandocs:

dtype : data-type, optional
Type to use in computing the mean.  For integer inputs, the default
is `float64`; for floating point inputs, it is the same as the
input dtype.
Notes
-----
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has.  Depending on the input data, this can
cause the results to be inaccurate, especially for `float32` (see
example below).  Specifying a higher-precision accumulator using the
`dtype` keyword can alleviate this issue.

使用dtype参数:

In [74]: np.mean(arr, axis=0, dtype='int32')
Out[74]: array([-15, -15, -15])
In [75]: np.mean(arr, axis=0, dtype='int64')
Out[75]: array([126, 126, 126], dtype=int64)
In [76]: np.mean(arr, axis=0, dtype='float32')
Out[76]: array([47.589664, 47.589664, 47.589664], dtype=float32)
In [77]: np.mean(arr, axis=0, dtype='float64')
Out[77]: array([126.9822936 , 126.99682718, 126.99214526])

-15的解释如下:

In [78]: -1424749891/arr.shape[0]
Out[78]: -15.786702393351801
总之,如果你想要准确的结果,你需要使用float64,或者默认的mean dtype,或者适当的astype。使用float32可能会出现问题,特别是对于这么多元素。

设置为"解决问题

array = np.random.randint(0, high=255, size=(3, 12000, 12000),dtype="int")
array = array[:,500:10000,500:10000]
array= array.reshape((-1,3))
# array.shape is now (90250000, 3)
print(array.mean(axis=0),array.dtype) # Nr.1
array = array.astype("float64")
print(array.mean(axis=0),array.dtype) # Nr.2

结果:

[126.98418438 126.9969912  127.00242922] int32
[126.98418438 126.9969912  127.00242922] float64