澄清更快的R-CNN火炬视觉实施

我正在深入研究torchvision的Faster R-CNN实现的源代码，我面临着一些我不太理解的事情。也就是说，假设我想创建一个更快的R-CNN模型，而不是在COCO上预训练，使用在ImageNet上预训练的主干，然后只得到主干，我做以下操作：

plain_backbone = fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=True).backbone.body

这与骨干网的设置方式一致，如这里和这里所示。然而，当我通过模型传递图像时，结果与我直接设置resnet50时获得的结果不一致。即：

# Regular resnet50, pretrained on ImageNet, without the classifier and the average pooling layer
resnet50_1 = torch.nn.Sequential(*(list(torchvision.models.resnet50(pretrained=True).children())[:-2]))
resnet50_1.eval()
# Resnet50, extract from the Faster R-CNN, also pre-trained on ImageNet
resnet50_2 = fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=True).backbone.body
resnet50_2.eval()
# Loading a random image, converted to torch.Tensor, rescalled to [0, 1] (not that it matters)
image = transforms.ToTensor()(Image.open("random_images/random.jpg")).unsqueeze(0)
# Obtaining the model outputs
with torch.no_grad():
# Output from the regular resnet50
output_1 = resnet50_1(image)
# Output from the resnet50 extracted from the Faster R-CNN
output_2 = resnet50_2(image)["3"]
# Their outputs aren't the same, which I would assume they should be
np.testing.assert_almost_equal(output_1.numpy(), output_2.numpy())

期待你的想法！

这是因为fasterrcnn_resnet50_fpn使用自定义规范化层(FrozenBatchNorm2d(而不是默认的BatchNorm2D。它们非常相似，但我怀疑微小的数字差异是造成问题的原因。

如果您指定用于标准resnet的相同规范化层，它将通过检查：

import torch
import torchvision
from torchvision.models.detection.faster_rcnn import fasterrcnn_resnet50_fpn
import numpy as np
from torchvision.ops import misc as misc_nn_ops
# Regular resnet50, pretrained on ImageNet, without the classifier and the average pooling layer
resnet50_1 = torch.nn.Sequential(*(list(torchvision.models.resnet50(pretrained=True, norm_layer=misc_nn_ops.FrozenBatchNorm2d).children())[:-2]))
resnet50_1.eval()
# Resnet50, extract from the Faster R-CNN, also pre-trained on ImageNet
resnet50_2 = fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=True).backbone.body
resnet50_2.eval()
# am too lazy to get a real image
image = torch.ones((1, 3, 224, 224))
# Obtaining the model outputs
with torch.no_grad():
# Output from the regular resnet50
output_1 = resnet50_1(image)
# Output from the resnet50 extracted from the Faster R-CNN
output_2 = resnet50_2(image)["3"]
# Passes
np.testing.assert_almost_equal(output_1.numpy(), output_2.numpy())

相关内容

最新更新

热门标签：