我对python多处理完全陌生,而且有点被大量的在线资源淹没了,所以我想从这里得到一个更明确的方法。我的代码如下:前向和后向这两个函数在计算上非常昂贵。在我的输入数据集上,每个数据集大约需要13分钟。我想同时计算这两个矩阵(向前和向后,请参阅decode((函数中的第3行和第4行代码(。我查阅了一些在线教程,我想我可以使用multiprocessing.process来完成这项工作。但是,我不知道如何检索numpy数组。我知道有像Queue、Array这样的东西,但它们的使用似乎很有限制,在这里似乎不合适。提前感谢!''
def forward(self, emis):
# Given the observed haplotype, compute its forward matrix
f = np.full((self.n1+self.n2, self.numSNP), np.nan)
# initialization
f[:,0] = (-math.log(self.n1+self.n2) + emis[0]).flatten()
# fill in forward matrix
for j in range(1, self.numSNP):
T = self.transition(self.D[j])
# using axis=1, logsumexp sum over each column of the transition matrix
f[:, j] = emis[j] + logsumexp(f[:,j-1][:,np.newaxis] + T, axis=0)
return f
#@profile
def backward(self, emis):
# Given the observed haplotype, compute its backward matrix
b = np.full((self.n1+self.n2, self.numSNP), np.nan)
# initialization
b[:, self.numSNP-1] = np.full(self.n1+self.n2, 0)
for j in range(self.numSNP-2, -1, -1):
T = self.transition(self.D[j+1])
b[:,j] = logsumexp(T + emis[j+1] + b[:,j+1], axis=1)
return b
#@profile
def decode(self, obs):
# infer hidden state of each SNP sites in the given haplotype
# state[j] = 0 means site j was most likely copied from population 1
# and state[j] = 1 means site j was most likely copies from population 2
start = time.time()
emis = self.emissionALL(obs)
f = self.forward(emis)
b = self.backward(emis)
end= time.time()
print(f'uncached version takes time {end-start}')
print(f'forward probability:{logsumexp(f[:,-1])}')
print(f'backward probability:{logsumexp(-math.log(self.n1+self.n2)+emis[0]+b[:,0])}')
return 0
''
如果您只是使用矩阵,我不确定Array对多处理的限制是什么。它还不完整,但这将是一个想法。
from multiprocessing.sharedctypes import RawArray
#make some empty arrays
yourMat = RawArray('d', X_size)
resultMat = RawArray('d', X_size)
...
ptemp=multiprocessing.Process(target=backward, args=(yourMat ,resultMat ))
ptemp.daemon=True
ptemp.start()
...
data = np.frombuffer(yourMat, dtype=np.float64)
#do something with data
resultMat [i:j] = data
...
#get the data
results = np.frombuffer(resultMat , dtype='i')
您可以查看这篇文章以获得完整的示例:使用共享内存中的numpy数组进行多处理