我想知道np的值。均值函数大致相同,前后的dtype变化。dtype必须保持float32。
array = np.random.randint(0, high=255, size=(3, 12000, 12000),dtype="int")
array = array[:,500:10000,500:10000]
array= array.reshape((-1,3))
# array.shape is now (90250000, 3)
print(np.mean(array,axis=0),array.dtype) # Nr.1
array = array.astype("float32")
print(np.mean(array,axis=0),array.dtype) # Nr.2
两个print函数的结果:
[127.003107 127.00156286 126.99015613] int32
[47.589664 47.589664 47.589664] float32
在视图行中添加.copy()无效。视图的大小影响对浮点平均值的影响。将最后两个维度的大小都更改为[500:8000]
的结果是:
[76.35497 76.35497 76.35497] float32
在[500:5000]
附近及以下,两者的均值实际上大致相同。
从重塑行开始更改代码:
array= array.reshape((-1,3))
array_float = array.astype("float32")
print(np.all(array_float==array),array.dtype,array_float.dtype)
结果:
True int32 float32
所以如果值是相同的,为什么np的结果。
你的数组:
In [50]: arr.shape, arr.dtype
Out[50]: ((90250000, 3), dtype('int32'))
你可以用np.random.randint(0, high=255, size=(90250000,3),dtype="int")
得到这个。事实上,我们不需要这个尺寸为3的维度。无论如何,它只是(0,255)范围内的许多数字。
期望平均值:
In [51]: np.mean(arr, axis=0)
Out[51]: array([126.9822936 , 126.99682718, 126.99214526])
但是注意如果我们把这些数字加起来会得到什么:
In [52]: np.sum(arr, axis=0)
Out[52]: array([-1424749891, -1423438235, -1423860778])
int32
和为溢出和缠绕。数字太多了。所以mean
一定在做一些比简单的求和和除以计数更复杂的事情。
取float32
的平均值,得到有趣的值:
In [53]: np.mean(arr.astype('float32'), axis=0)
Out[53]: array([47.589664, 47.589664, 47.589664], dtype=float32)
butfloat64
匹配int类型(但转换时间较长):
In [54]: np.mean(arr.astype('float64'), axis=0)
Out[54]: array([126.9822936 , 126.99682718, 126.99214526])
看起来浮点数mean
只是在做求和除法:
In [56]: np.sum(arr.astype('float64'), axis=0)
Out[56]: array([1.14601520e+10, 1.14614637e+10, 1.14610411e+10])
In [57]: np.sum(arr.astype('float32'), axis=0)
Out[57]: array([4.2949673e+09, 4.2949673e+09, 4.2949673e+09], dtype=float32)
In [58]: Out[56]/arr.shape[0]
Out[58]: array([126.9822936 , 126.99682718, 126.99214526])
In [59]: Out[57]/arr.shape[0]
Out[59]: array([47.58966533, 47.58966533, 47.58966533])
当和在float32
:
In [60]: np.finfo('float32')
Out[60]: finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)
由于某些原因,它无法获得正确的值。
请注意pythonsum
有int版本的问题:
In [70]: sum(arr[:,0])
C:UserspaulAppDataLocalTempipykernel_11281456076714.py:1: RuntimeWarning: overflow encountered in long_scalars
sum(arr[:,0])
Out[70]: -1424749891
有一个math.fsum
可以更好地处理大量的数据:
In [71]: math.fsum(arr[:,0])
Out[71]: 11460151997.0
对长整型浮点数求和也可以:
In [72]: np.sum(arr.astype('int64'),axis=0)
Out[72]: array([11460151997, 11461463653, 11461041110], dtype=int64)
Fromnp.mean
docs:
dtype : data-type, optional
Type to use in computing the mean. For integer inputs, the default
is `float64`; for floating point inputs, it is the same as the
input dtype.
Notes
-----
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for `float32` (see
example below). Specifying a higher-precision accumulator using the
`dtype` keyword can alleviate this issue.
使用dtype
参数:
In [74]: np.mean(arr, axis=0, dtype='int32')
Out[74]: array([-15, -15, -15])
In [75]: np.mean(arr, axis=0, dtype='int64')
Out[75]: array([126, 126, 126], dtype=int64)
In [76]: np.mean(arr, axis=0, dtype='float32')
Out[76]: array([47.589664, 47.589664, 47.589664], dtype=float32)
In [77]: np.mean(arr, axis=0, dtype='float64')
Out[77]: array([126.9822936 , 126.99682718, 126.99214526])
-15
的解释如下:
In [78]: -1424749891/arr.shape[0]
Out[78]: -15.786702393351801
总之,如果你想要准确的结果,你需要使用float64
,或者默认的mean dtype
,或者适当的astype
。使用float32
可能会出现问题,特别是对于这么多元素。
设置为"解决问题
array = np.random.randint(0, high=255, size=(3, 12000, 12000),dtype="int")
array = array[:,500:10000,500:10000]
array= array.reshape((-1,3))
# array.shape is now (90250000, 3)
print(array.mean(axis=0),array.dtype) # Nr.1
array = array.astype("float64")
print(array.mean(axis=0),array.dtype) # Nr.2
结果:
[126.98418438 126.9969912 127.00242922] int32
[126.98418438 126.9969912 127.00242922] float64