Pytorch三维张量与1D张量的内积生成2D张量

操作：我有维度[n x m x c]的pytorch张量A和维度[1 x 1 x c]的B。我想从A得到每个1 x 1 x c向量与B的内积，从而生成维度为[n x m]的张量C。

在我的网络的前向函数中，在特定步骤，我接收维度为[N, channels, Height, Width]的张量，其中N是图像的数量，通道是特征图中的通道数量，高度和宽度是当前特征图的高度和宽度。我还有一个来自其他子网络的[N x channels]特征图。在下一步中，我想执行上述操作。

有人能解释一下pytorch中实现这一步骤的最佳方式和功能吗。

我是pytorch的新手，找不到合适的方法。Tensorflow支持NHWC格式，但我认为pytorch不支持，所以其中一种方法是将其重塑为[N, Height, Width, channels]，然后像一样迭代

# if img is reshaped to [N, H, W, C]
img
# tensor of dimension [N, C]
aud
ans = torch.empty(N, H, W, dtype=torch.double)
for batches in range(img.shape[0]):
for i in range(img.shape[1]):
for j in range(img.shape[2]):
ans[batches][i][j] = torch.dot(img[batches][i][j], aud[batches])

还有其他清洁剂API吗？

附言：这个步骤是DeepMind的论文"Object That Sound"中声音定位步骤所必需的。

有一个单行

ans = torch.einsum('nhwc,nc->nhw', img, aud)

如果你以前没有任何经验，torch.einsum的API可能很难掌握，但它非常强大，并推广了大量线性代数运算(转座、矩阵乘法和迹)。

import torch
N, H, W, C = 10, 11, 12, 13
img = torch.randn(N, H, W, C)
aud = torch.randn(N, C)
ans = torch.empty(N, H, W)
for batches in range(img.shape[0]):
for i in range(img.shape[1]):
for j in range(img.shape[2]):
ans[batches][i][j] = torch.dot(img[batches][i][j], aud[batches])
ans2 = torch.einsum('nhwc,nc->nhw', img, aud)
assert torch.allclose(ans, ans2, atol=1e-6)

注意，由于数值精度问题，我不得不将断言容差增加到标准1e-8以上。如果einsum在更高级的用例中成为瓶颈，请查看opt_einsum，它可以优化底层操作的顺序以提高性能。

相关内容

最新更新

热门标签：