使用CAFFE模型提取功能

我想使用称为gogglenet的caffe模型讨论特征提取。我指的是本文"在拥挤的场景中终结人们的发现"。对于那些熟悉Caffe的人，应该能够应付我的查询。

该论文使用Python有自己的库，我也贯穿了图书馆，但无法应付论文中提到的某些点。

使用GoogleNet till inception_5b/output层传递输入图像。

然后，输出形成为15x20x1024中的多维阵列。因此，每个1024矢量代表64x64区域中心的边界框。由于它是50％的重叠，因此有15x20矩阵，用于640x480图像，每个单元的长度为1024个矢量。

我的查询是

（1）如何获得此15x20x1024阵列输出？

（2）此1x1x1024数据如何代表图像中的64x64区域？源代码中有一个描述为

"""Takes the output from the decapitated googlenet and transforms the output
    from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers.
    N = batch size, C = channels, W = grid width, H = grid height."""

使用Python中的函数AS

实现了转换

def generate_intermediate_layers(net):
    """Takes the output from the decapitated googlenet and transforms the output
    from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers.
    N = batch size, C = channels, W = grid width, H = grid height."""
    net.f(Convolution("post_fc7_conv", bottoms=["inception_5b/output"],
                      param_lr_mults=[1., 2.], param_decay_mults=[0., 0.],
                      num_output=1024, kernel_dim=(1, 1),
                      weight_filler=Filler("gaussian", 0.005),
                      bias_filler=Filler("constant", 0.)))
    net.f(Power("lstm_fc7_conv", scale=0.01, bottoms=["post_fc7_conv"]))
    net.f(Transpose("lstm_input", bottoms=["lstm_fc7_conv"]))

我不能应付该部分，因为每个1x1x1024如何表示边界盒矩形的大小。

，由于您在网络深处查看一个1x1单元格，因此有效的回收字段很大，并且可以（可能是）原始图像中的64x64像素。

也就是说，"inception_5b/output"中的每个功能都受到输入图像中的64x64像素的影响。

相关内容

最新更新

热门标签：