延迟交织/交织磁盘阵列



我需要逐帧交错两个大HDF5数据集,表示来自微观测量的两个通道的视频帧。我认为Dask很适合这项工作和下游流程。

两个数组具有相同的形状和数据类型。基于这个链接,我可以用NumPy来处理小于内存的数组:交织两个numpy数组

import numpy as np
# a numpy example of channel 1 data
ch1 = np.arange(1,5)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))
# channel 2 has the same shape and dtype
ch2 = np.arange(10,50,10)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))
# the interleaving starts with assigning a new array with douled size of the first dimension
ch1_2 = np.empty((2*ch1.shape[0],*ch1.shape[1:]), dtype=ch1.dtype)
# two assignments takes care of the interleaving 
ch1_2[0::2] = ch1
ch1_2[1::2] = ch2

遗憾的是,它不适用于Dask。

import dask.array as da
da_ch1 = da.from_array(ch1)
da_ch2 = da.from_array(ch2)
da_ch1_2 = da.empty((2*da_ch1.shape[0],*da_ch1.shape[1:]), dtype=da_ch1.dtype)
da_ch1_2[0::2] = da_ch1
da_ch1_2[1::2] = da_ch2

使用<类'slice'>不是supported" .

谁能帮我一个与Dask兼容的替代方法?如有任何帮助,不胜感激。

这是针对该问题的高级任务数组解决方案:

da_ch1_2=da.rollaxis(da.stack((da_ch1,da_ch2)),axis=1).reshape((-1,*da_ch1.shape[1:]))

下面的代码处理您发布的小示例数据。您可能还需要准备一个类似的延迟函数来读取hdf5数据。

import dask.array as da
from dask import delayed
import numpy as np
@delayed
def interleave(x1, x2):
x1_2 = np.empty(ch1_2_shape, dtype=ch1.dtype)
x1_2[0::2] = x1
x1_2[1::2] = x2
return x1_2
# a numpy example of channel 1 data
ch1 = np.arange(1,5)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))
# channel 2 has the same shape and dtype
ch2 = np.arange(10,50,10)[:,np.newaxis,np.newaxis]*np.ones((4,3,2))
# Interleave using dask delayed
ch1_2_shape = (2*ch1.shape[0],*ch1.shape[1:])
ch1_2 = interleave(ch1, ch2)
# Convert to dask array if required
ch1_2 = da.from_delayed(interleave(ch1, ch2), ch1_2_shape, dtype=ch1.dtype)
ch1_2.compute()

最新更新