我正在使用以下内容:
- 库达 10.0
- PyTorch 1.2
- https://github.com/ruotianluo/pytorch-faster-rcnn
- 测试权重集与训练权重集不同。 训练
- 重量组来自咖啡预训练 ResNet101 骨干
我已经采用了这个回购并将其转换为使用 Kitti 数据。在此过程中,我在数据集中添加了一个新的Kitti类,并进行了必要的转换。测试和评估都使用 PASCAL VOC 中的以下类集:
self._classes = (
'__background__', # always index 0
'aeroplane',
'bicycle',
'bird',
'boat',
'bottle',
'bus',
'car',
'cat',
'chair',
'cow',
'diningtable',
'dog',
'horse',
'motorbike',
'person',
'pottedplant',
'sheep',
'sofa',
'train',
'tvmonitor')
我已将类集更改为:
self._classes = (
'dontcare', # always index 0
'pedestrian',
'car',
'truck',
'cyclist')
#-----------------------------
N.B.: Classes should NOT matter here, as the result out of the backbone is simply a featureset, not a classification
#-----------------------------
在看似随机的图像中(将这些"问题"图像从训练集中取出似乎改变了程序在哪个图像上失败(,训练代码似乎从区域提案网络中产生 NaN。我有点不明白为什么。
- 尝试将规范化更改为特定于 Kitti 的规范化值
- 已尝试将图像大小调整为 224x224
尝试将规范化数字除以平均标准差
-----------------
网络定义
-----------------
self.conv1 = conv3x3(平面、平面、步幅( self.bn1 = norm_layer(平面( self.relu = nn.ReLU(inplace=True( self.conv2 = conv3x3(planes, planes( self.bn2 = norm_layer(平面( 自采样 = 下采样 自我.步幅 = 步幅
self._layers['头'] = nn.Sequential(self.resnet.conv1
, self.resnet.bn1, self.resnet.relu,self.resnet.maxpool, self.resnet.layer1, self.resnet.layer2,self.resnet.layer3(self.rpn_net = nn。Conv2d(self._net_conv_channels, cfg.RPN_CHANNELS, [3, 3], 填充=1(
-----------------
准备映像
-----------------
self._image = torch.from_numpy(image.transpose([0, 3, 1, 2]((.to(self._device( self.net.train_step(blobs, self.optimizer(
-----------------
计算图
-----------------
(1( self.forward(blobs['data'], blobs['im_info'], blobs['gt_boxes']( (2( ROIS, cls_prob, bbox_pred = self._predict(( (3( net_conv = self._image_to_head(( (4( net_conv = self._layers"头" (5( rpn = F.relu(self.rpn_net(net_conv((
-------------------
解决问题的有用功能
-------------------
def conv3x3(in_planes, out_planes, 步幅=1, 组=1, 膨胀=1(: ""带填充的 3x3 卷积"" 返回 nn。Conv2d(in_planes, out_planes, kernel_size=3, 步幅=步幅,填充=膨胀, 组=组, 偏置=假, 膨胀=膨胀(
def conv1x1(in_planes, out_planes, stride=1(: ""1x1 卷积"" 返回 nn。Conv2d(in_planes, out_planes, kernel_size=1, 步幅=步幅, 偏差=假(
我不知道为什么会发生这种情况,但显然我期望 ResNet101 骨干网出现实际数字。可能需要切换到 vgg16。
输出 (3(
tensor([[[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]]], device='cuda:0'
有人知道这里发生了什么吗?
解决了。VOC Pascal(与此 github 存储库一起使用的原始数据集(的像素位置起始索引值为 1[1 到 ymax],其中 Kitti 像素从 0[0 到 ymax-1] 开始。
需要从边界框目标生成中删除 -1。