需要帮助加速计算移位的巧合的Python代码



我之前发布了一个关于试图使代码更高效的问题。代码的目的仅仅是找出平均的"巧合"。在两个NumPy数组之间,表示两个通道的时间值。这个"coincidence"在特定的时间间隔内,当两个通道中的每个通道中至少有一个时间值时发生。我将长数组分解为~(10k-100k)块。然后将这些块扔进垃圾箱,我计算类似箱之间的巧合。ie。(bin #n,chunk1 of array1)和(bin #n,chunk1 of array2)。下面是问题的链接:需要帮助加速numpy代码,查找' coincides '在两个NumPy数组之间

按照@AboAmmar &@Jerome Richard:我修改了代码,显著提高了速度。
现在,我必须添加一个额外的步骤来计算移位的巧合,即在bin #n &Bin #(n+shift),然后对不同的shift值执行此操作。代码比以前快了,但对于需要分析的数据量来说仍然太慢了。

我是这样做的:

@njit
def numba_histogram(a, b, n):
hista, histb = np.zeros(n, dtype=np.intp), np.zeros(n, dtype=np.intp)
a_min, a_max = min(a[0], b[0]), max(a[-1], b[-1])
for x, y in zip(a, b):
bin = n * (x - a_min) / (a_max - a_min)
if x == a_max:
hista[n - 1] += 1
elif bin >= 0 and bin < n:
hista[int(bin)] += 1        
bin = n * (y - a_min) / (a_max - a_min)
if y == a_max:
histb[n - 1] += 1
elif bin >= 0 and bin < n:
histb[int(bin)] += 1
return hista, histb
@njit 
def calc_coincidence(t1,t2, NumChunks, NumofBins, td):
CoincAllChunks = 0
last = 0.0
avg = len(t1) / NumChunks

while last < len(t1):
t1chunks = t1[int(last):int(last + avg)]
t2chunks = t2[int(last):int(last + avg)]
if (len(t1chunks)<=1) or (len(t2chunks)<=1):
last += avg

else:
Hist1, Hist2 = numba_histogram(t1chunks,t2chunks,NumofBins)
Hist2 = Hist2[td:]
appendthis = np.zeros(td)
Hist2 = np.append(Hist2, appendthis)
CoincOneChunk = np.sum( (Hist1 > 0) * (Hist2 > 0) )
CoincAllChunks += CoincOneChunk  
last += avg 

return CoincAllChunks
##### NumChunks,BinSize,Num of Delays ########   
NumChunks = 10
HowManyUnitsofns = 100
BinSize = HowManyUnitsofns*10**(-9)
NumberofDelays= 10
###################################################
NumberofFiles = 10
CoinAllDelaysAllFiles = np.zeros(NumberofDelays, dtype=np.int64)
ListofCoinforDiffDelays = []
for n in range(NumberofFiles):
specificfile = 'file{}.h5'.format(n)
t1, t2, DataTime = ReadFile(specificfile)

#################### Normalization ################
ChunkSize = (DataTime)/(NumChunks)
NumofBins = int(ChunkSize/BinSize)
Singles1 = len(t1)
Singles2 = len(t2)
Normalization = (Singles1*Singles2*BinSize)/DataTime
####################################################

CoinAllDelaysOneFile = []
for i in range(NumberofDelays):  
CoinOneDelay = 0
NormCoinOneDelay = (calc_coincidence(t1,t2, NumChunks, NumofBins,i))/Normalization
CoinAllDelaysOneFile = np.append(CoinAllDelaysOneFile, NormCoinOneDelay)

CoinAllDelaysAllFiles = np.vstack((CoinAllDelaysAllFiles, CoinAllDelaysOneFile))
for k in range(NumberofDelays):
coineachdelay = np.sum(CoinAllDelaysAllFiles[:,k])
ListofCoinforDiffDelays = np.append(ListofCoinforDiffDelays, coineachdelay)
NormListofCoinforDiffDelays = ListofCoinforDiffDelays/(NumberofFiles)

print(NormListofCoinforDiffDelays)
tau = HowManyUnitsofns*np.arange(NumberofDelays)
plt.scatter(tau,NormListofCoinforDiffDelays)
plt.ylim(0,2)

我能做些什么使它更快?任何建议都将不胜感激。谢谢你!

编辑:我基本上是在寻找巧合的数量如何随移位值而变化。因此,我计算每次移位的巧合数,并绘制其与移位值的关系。要处理的数据:https://www.filehosting.org/file/details/8168796/To%20upload%20on%20StackOverflow.zip

这可能无法解决您的问题,但是并行化numba_histogram可能会有所帮助。我建议这样写:

@njit
slice_hist(x, n, hmin, hmax):
hist = np.zeros(n, dtype=np.intp),
for x, y in zip(a, b):
bin = n * (x - hmin) / (hmax - hmin)
if x == hmax:
hist[n - 1] += 1
elif bin >= 0 and bin < n:
hist[int(bin)] += 1        
return hist
@njit(parallel=True)
numba_histogram(a, b, n):
hista, histb = np.zeros(n, dtype=np.intp), np.zeros(n, dtype=np.intp)
ntc = numba.config.NUM_THREADS
a_slices, b_slices = np.split(a, ntc), np.split(b, ntc)
hist_min, hist_max = min(a[0], b[0]), max(a[-1], b[-1])
for i in numba.prange(ntc):
hista+=slice_hist(a_sclices[i], n, hist_min, hist_max)
histb+=slice_hist(b_sclices[i], n, hist_min, hist_max)
return hista, histb

根据您的机器和块大小限制使用的线程数numba.config。NUM_THREADS实际上可能会带来更好的性能。您可以通过限制slice_hist中直方图的范围来进一步优化它,尽管我对此不太确定。

最新更新