pycuda的gpuarray.dot((操作与numpy.dot((操作没有相同的事情。这是故意的吗?
例如,下面的代码执行numpy.dot((然后gpuarray.dot((。前者返回一个5x5阵列,后者单个数字。
import numpy as np
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
np.random.seed(1)
print ("nNUMPY: result of np.dot - OK")
a = np.array(2 * np.random.random((5, 5)) - 1)
b = np.array(2 * np.random.random((5, 5)) - 1)
a_b_dot = np.dot(a, b)
print (type(a_b_dot), a_b_dot.shape)
print (a_b_dot)
print ("nPYCUDA: result of gpuarray.dot - NOT OK")
a_gpu = gpuarray.to_gpu(a)
b_gpu = gpuarray.to_gpu(b)
a_b_dot = gpuarray.dot(a_gpu, b_gpu)
print (type(a_b_dot), a_b_dot.shape)
print (a_b_dot)
输出是:
NUMPY: result of np.dot - OK
<class 'numpy.ndarray'> (5, 5)
[[-0.4289689 -1.07826831 0.35264673 1.17316284 0.37989478]
[-0.23539466 0.62140658 0.02890465 0.64194572 -0.90554719]
[ 0.6308665 -0.5418927 0.15072667 1.53949101 -0.17648109]
[-0.28165967 -1.06345895 0.17784186 -0.50902276 1.27061422]
[ 0.15769648 0.01993701 -0.42621895 -0.07254009 -0.23463897]]
PYCUDA: result of gpuarray.dot - NOT OK
<class 'pycuda.gpuarray.GPUArray'> ()
-0.3611777016515303
我很确定pycuda之所(。扩展此功能的最简单方法是在整个过程中应用DOT产品,并让最终用户使用并找到自己的矩阵倍数库,如果他们需要更高级的内容。
如果您实际上想要以这种方式在矩阵上的点产品,则只需使用矩阵乘法,请参见以下示例以获取证明:
import numpy as np
print("nNUMPY: result of np.dot - OK")
a = np.array(2 * np.random.random((5, 5)) - 1)
b = np.array(2 * np.random.random((5, 5)) - 1)
a_b_dot = np.dot(a, b)
a_mul_b = np.matmul(a, b)
print(type(a_b_dot), a_b_dot.shape)
print(a_b_dot)
print(type(a_mul_b), a_mul_b.shape)
print(a_mul_b)
NUMPY: result of np.dot - OK
<class 'numpy.ndarray'> (5, 5)
[[-0.12441477 -0.28175903 0.36632673 0.35687491 -0.25773564]
[-0.57845471 -0.4097741 0.3505651 -0.23822489 1.17375904]
[-0.19920533 -0.43918224 0.62438656 0.6326451 -0.27798801]
[ 0.67128494 0.44472894 -0.57700879 -0.57246653 -0.0336262 ]
[ 0.49149948 -0.65774616 1.09320886 0.76179777 -0.76590202]]
<class 'numpy.ndarray'> (5, 5)
[[-0.12441477 -0.28175903 0.36632673 0.35687491 -0.25773564]
[-0.57845471 -0.4097741 0.3505651 -0.23822489 1.17375904]
[-0.19920533 -0.43918224 0.62438656 0.6326451 -0.27798801]
[ 0.67128494 0.44472894 -0.57700879 -0.57246653 -0.0336262 ]
[ 0.49149948 -0.65774616 1.09320886 0.76179777 -0.76590202]]
要执行真正的矩阵乘法,您将a:需要实现自己的b:使用scikit cuda(这两者都取决于pycuda和与之互动(。
在scikit cuda中,这与numpy(直接从scikit cuda docs撕裂(
几乎相同>>> import pycuda.autoinit
>>> import pycuda.gpuarray as gpuarray
>>> import numpy as np
>>> import skcuda.linalg as linalg
>>> import skcuda.misc as misc
>>> linalg.init()
>>> a = np.asarray(np.random.rand(4, 2), np.float32)
>>> b = np.asarray(np.random.rand(2, 2), np.float32)
>>> a_gpu = gpuarray.to_gpu(a)
>>> b_gpu = gpuarray.to_gpu(b)
>>> c_gpu = linalg.dot(a_gpu, b_gpu)
>>> np.allclose(np.dot(a, b), c_gpu.get())
True
>>> d = np.asarray(np.random.rand(5), np.float32)
>>> e = np.asarray(np.random.rand(5), np.float32)
>>> d_gpu = gpuarray.to_gpu(d)
>>> e_gpu = gpuarray.to_gpu(e)
>>> f = linalg.dot(d_gpu, e_gpu)
>>> np.allclose(np.dot(d, e), f)
True
在带有Scipy的引擎盖下,您使用的是CUDA DLL后端,该后端将事物转换为CTYPE等,并且您会发现比Numpy(在大多数情况下都粘贴2个维度(,您会注意到乘以乘法的层次要低得多。如果您确实需要在nd矩阵中使用乘数矩阵乘以乘数,则它们仍然是2D,但是您可以使用后端批处理功能或MDOT