我想使用python中实现的最大似然算法执行图像分割。类的平均向量和协方差矩阵是已知的,并且迭代图像(它们非常大......5100X7020)我们可以计算每个像素成为给定类一部分的概率。
简单地用 Python 编写
import numpy as np
from numpy.linalg import inv
from numpy.linalg import det
...
probImage1 = []
probImage1Vector = []
norm = 1.0 / (np.power((2*np.pi), i/2) * np.sqrt(np.linalg.det(covMatrixClass1)))
covMatrixInverz = np.linalg.inv(covMatrixClass1)
for x in xrange(x_img):
for y in xrange(y_img):
X = realImage[x,y]
pixelValueDifference = X - meanVectorClass1
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
expo = np.exp(multMult)
probImage1Vector.append(np.multiply(norm,expo))
probImage1.append(probImage1Vector)
probImage1Vector = []
此代码在大图像上执行时非常慢的问题...像向量减法和乘法这样的计算会消耗大量时间,即使它们只是 1X3 向量。
你能给出一个提示如何加速这段代码吗?我真的很感激。抱歉,如果我不清楚,我仍然是 python 的初学者。
仔细看看:
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
我们看到操作基本上是:
A.T (d) C (d) A # where `(d)` is the dot-product
这三个步骤可以很容易地用np.einsum
表示为一个字符串表示法,就像这样——
np.einsum('k,lk,l->',pA,covMatrixInverz,-0.5*pA)
在迭代器i(=x)
和j(=y)
中执行此操作,我们将有一个完全矢量化的表达式 -
np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))
或者,我们可以使用 np.tensordot
执行总和缩减的第一部分 -
mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
output = np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized)
<小时 />标杆
将所有方法列为函数 -
# Original code posted by OP to return array
def org_app(meanVectorClass1, realImage, covMatrixInverz, norm):
probImage1 = []
probImage1Vector = []
x_img, y_img = realImage.shape[:2]
for x in xrange(x_img):
for y in xrange(y_img):
X = realImage[x,y]
pixelValueDifference = X - meanVectorClass1
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
expo = np.exp(multMult)
probImage1Vector.append(np.multiply(norm,expo))
probImage1.append(probImage1Vector)
probImage1Vector = []
return np.asarray(probImage1).reshape(x_img,y_img)
def vectorized(meanVectorClass1, realImage, covMatrixInverz, norm):
pA = realImage - meanVectorClass1
mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
return np.exp(np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized))*norm
def vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm):
pA = realImage - meanVectorClass1
return np.exp(np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))*norm
计时-
In [19]: # Setup inputs
...: meanVectorClass1 = np.array([23.96000000, 58.159999, 61.5399])
...:
...: covMatrixClass1 = np.array([[ 514.20040404, 461.68323232, 364.35515152],
...: [ 461.68323232, 519.63070707, 446.48848485],
...: [ 364.35515152, 446.48848485, 476.37212121]])
...: covMatrixInverz = np.linalg.inv(covMatrixClass1)
...:
...: norm = 0.234 # Random float number
...: realImage = np.random.rand(1000,2000,3)
...:
In [20]: out1 = org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
...: out2 = vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
...: out3 = vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
...: print np.allclose(out1, out2)
...: print np.allclose(out1, out3)
...:
True
True
In [21]: %timeit org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 27.8 s per loop
In [22]: %timeit vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 182 ms per loop
In [23]: %timeit vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 275 ms per loop
看起来完全矢量化einsum + tensordot
混合解决方案做得很好!
为了进一步提高性能,还可以查看numexpr
模块以加快大型数组的exponential
计算速度。
作为第一步,我将摆脱不必要的函数调用,如转置、点和乘法。这些都是您应该内联执行的简单计算。当你能够真正看到自己在做什么时,而不是隐藏在函数中,就会更容易理解性能问题。
这里的根本问题是,这似乎至少是一个四分复杂度操作。您可能希望简单地将您在所有循环中执行的操作数相乘。是5亿、20亿、3500亿吗?几个?
为了控制性能,您需要了解您正在执行多少指令。现代计算机每秒可以执行大约10亿条指令,但如果涉及内存移动,则速度可能会慢得多。