将新整形的图像矩阵附加到新阵列的最快方法是什么


last_conv_w, last_conv_h, n_channels = last_conv_output.shape
upscaled_h = last_conv_h * height_factor
upscaled_w = last_conv_w * width_factor
upsampled_last_conv_output = np.zeros((upscaled_h, upscaled_w, n_channels))
for x in range(0, n_channels, 512):
upsampled_last_conv_output[:, :, x:x+512] = cv2.resize(last_conv_output[:, :, x:x+512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output.shape

这里的代码添加了一个调整大小的图像矩阵last_conv_output,其初始形状为(72048(。我认为可能的只是做这个操作:

upsampled_last_conv_output = cv2.resize(last_conv_output, (upscaled_w, upscaled_h), cv2.INTER_CUBIC)

但问题是,cv2.resize((最多只能处理512个通道。为了解决这个问题,我创建了一个for循环,它基本上一次循环每512个通道,并将其添加到数组upsampled_last_conv_output中。就我个人而言,这种方法大约需要2.5秒才能完成。

在我提出for循环解决方案之前,我也尝试过这种方法:

upsampled_last_conv_output_1 = cv2.resize(last_conv_output[:, :, :512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_2 = cv2.resize(last_conv_output[:, :, 512:1024], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_3 = cv2.resize(last_conv_output[:, :, 1024:1536], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_4 = cv2.resize(last_conv_output[:, :, 1536:2048], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output = np.concatenate((upsampled_last_conv_output_1, 
upsampled_last_conv_output_2,
upsampled_last_conv_output_3,
upsampled_last_conv_output_4),
axis=2)

这种方法需要大约0.9秒才能完成,比以前的方法快得多,但这种方法看起来对python非常不友好(因为如果我们有大约100万个通道或类似的东西会怎么样(。

所以我的问题是:有没有办法将第二种方法的速度与第一种方法的类似python的方法结合起来,或者有没有更好的方法来处理这个问题?

您可以在列表中累积数组

alist = []
for x in range(0, n_channels, 512):
alist.append( cv2.resize(last_conv_output[:, :, x:x+512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC))
upsampled_last_conv_output = np.concatenate(alist, axis=2)

我还没有测试过这个;我只是想把第一种情况下的迭代与第二种情况下concatenate结合起来。

我很惊讶有这么大的时差。我想upsampled_last_conv_output[:, :, x:x+512] = ...的分配可能相对昂贵。

另一个想法是添加一个维度

upsampled_last_conv_output = np.zeros((upscaled_h, upscaled_w, n_channels//512, 512))

那么迭代会更简单:

for i in range(...):
upsampled_last_conv_output[:,:,i,:] = 

类似地,重塑last_conv_output,以便可以从第二个维度迭代到最后一个维度。

您可能还想验证您的两种方法是否进行了相同数量的调整大小。

我没有仔细检查你的代码,所以我可能遗漏了细节。我还没有测试过任何东西——很明显,因为你没有提供[mcve]。

您的第一种循环和为所有结果预先分配内存的方法应该是最快的,因为它不会用np.contenate进行额外的复制。您确定测量的时间是正确的吗?

我制作了一个简单的代码片段,测量了多次运行的执行时间,得到了以下结果:

Elapsed time without loop: 68.31360507011414
Elapsed time with loop: 59.28367280960083

代码:


import time
import cv2
import numpy as np
n_channels = 2048
last_conv_h = 200
last_conv_w = 200
upscaled_h = last_conv_h * 3
upscaled_w = last_conv_w * 3
n_repetitions = 50
last_conv_output = np.random.uniform(size=(last_conv_h, last_conv_w, n_channels)).astype(np.float32)
upsampled_last_conv_output = np.zeros((upscaled_h, upscaled_w, n_channels))

start_time = time.time()
for _ in range(n_repetitions):
upsampled_last_conv_output_1 = cv2.resize(last_conv_output[:, :, :512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_2 = cv2.resize(last_conv_output[:, :, 512:1024], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_3 = cv2.resize(last_conv_output[:, :, 1024:1536], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_4 = cv2.resize(last_conv_output[:, :, 1536:2048], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output = np.concatenate((upsampled_last_conv_output_1,
upsampled_last_conv_output_2,
upsampled_last_conv_output_3,
upsampled_last_conv_output_4),
axis=2)
elapsed = time.time() - start_time
print(f"Elapsed time without loop: {elapsed}")

start_time = time.time()
for _ in range(n_repetitions):
for x in range(0, n_channels, 512):
upsampled_last_conv_output[:, :, x:x+512] = cv2.resize(last_conv_output[:, :, x:x+512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
elapsed = time.time() - start_time
print(f"Elapsed time with loop: {elapsed}")

最新更新