pytorch maxpool在通道维度上

我试图用pytorch构建CNN，并且在Maxpooling方面遇到了困难。我采用了斯坦福大学持有的CS231N。正如我所回忆的那样，maxpooling可以用作维数扣除步骤，例如，我有此（1、20，高度，宽度）输入ot max_pool2d（假设我的batch_size是1）。而且，如果我使用（1，1）内核，我想获得这样的输出：（1，1，高度，宽度），这意味着内核应滑到通道维度上。但是，在检查了Pytorch文档后，它说内核滑过高度和宽度。感谢Pytorch论坛上的@ImgPrcsng，他告诉我使用max_pool3d，结果效果很好。但是，Conv2D层的输出与MAX_POOL3D层的输入之间仍然存在重塑操作。因此，很难汇总成一个NN。

会像这样的工作吗？

from torch.nn import MaxPool1d
import torch.nn.functional as F

class ChannelPool(MaxPool1d):
    def forward(self, input):
        n, c, w, h = input.size()
        input = input.view(n, c, w * h).permute(0, 2, 1)
        pooled = F.max_pool1d(
            input,
            self.kernel_size,
            self.stride,
            self.padding,
            self.dilation,
            self.ceil_mode,
            self.return_indices,
        )
        _, _, c = pooled.size()
        pooled = pooled.permute(0, 2, 1)
        return pooled.view(n, c, w, h)

或使用einops

from torch.nn import MaxPool1d
import torch.nn.functional as F
from einops import rearrange

class ChannelPool(MaxPool1d):
    def forward(self, input):
        n, c, w, h = input.size()
        pool = lambda x: F.max_pool1d(
            x,
            self.kernel_size,
            self.stride,
            self.padding,
            self.dilation,
            self.ceil_mode,
            self.return_indices,
        )
        return rearrange(
            pool(rearrange(input, "n c w h -> n (w h) c")),
            "n (w h) c -> n c w h",
            n=n,
            w=w,
            h=h,
        )

to在每个通道上的每个坐标中最大化，只需使用einops

中的图层

from einops.layers.torch import Reduce
max_pooling_layer = Reduce('b c h w -> b 1 h w', 'max')

层可以在模型中用作任何其他火炬模块

我不确定为什么其他答案如此复杂。最大池在整个通道维度上以获取一个输出，只有1个通道声音等于仅在该维度上占据最大值：

torch.amax(left_images, dim=1, keepdim=True)

相关内容

最新更新

热门标签：