如何使用截断的 SVD 减少全连接 (' "InnerProduct" ') 层

在论文Girshick, RFast-RCNN (ICCV 2015)，章节"3.1 Truncated SVD for faster detection"中，作者提出使用SVD技巧来减少全连接层的大小和计算时间。

给定一个训练的模型(deploy.prototxt和weights.caffemodel)，我如何使用这个技巧用截断的层代替完全连接的层?

一些线性代数背景
奇异值分解(SVD)是将任意矩阵W分解为三个矩阵:

W = U S V*

式中U和V为正交正交矩阵，S为对角线，且对角线上的元素大小递减。SVD的一个有趣的特性是，它允许用较低秩矩阵轻松地近似W:假设截断S，只保留k的前导元素(而不是对角线上的所有元素)，那么

W_app = U S_trunc V*

是W的秩k近似。

使用SVD逼近全连通层
假设我们有一个模型deploy_full.prototxt，它有一个完全连接的层

# ... some layers here
layer {
  name: "fc_orig"
  type: "InnerProduct"
  bottom: "in"
  top: "out"
  inner_product_param {
    num_output: 1000
    # more params...
  }
  # some more...
}
# more layers...

此外，我们有trained_weights_full.caffemodel训练的deploy_full.prototxt模型参数。

复制deploy_full.protoxt到deploy_svd.protoxt，在你选择的编辑器中打开。将全连接层替换为以下两个层:

layer {
  name: "fc_svd_U"
  type: "InnerProduct"
  bottom: "in" # same input
  top: "svd_interim"
  inner_product_param {
    num_output: 20  # approximate with k = 20 rank matrix
    bias_term: false
    # more params...
  }
  # some more...
}
# NO activation layer here!
layer {
  name: "fc_svd_V"
  type: "InnerProduct"
  bottom: "svd_interim"
  top: "out"   # same output
  inner_product_param {
    num_output: 1000  # original number of outputs
    # more params...
  }
  # some more...
}

在python中，一个小小的网络操作:

import caffe
import numpy as np
orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
# get the original weight matrix
W = np.array( orig_net.params['fc_orig'][0].data )
# SVD decomposition
k = 20 # same as num_ouput of fc_svd_U
U, s, V = np.linalg.svd(W)
S = np.zeros((U.shape[0], k), dtype='f4')
S[:k,:k] = s[:k]  # taking only leading k singular values
# assign weight to svd net
svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
# save the new weights
svd_net.save('trained_weights_svd.caffemodel')

现在我们有deploy_svd.prototxt和trained_weights_svd.caffemodel，它们用更少的乘法和权重近似原始网络。

实际上，Ross Girshick的py-faster-rcnn repo包含了SVD步骤的实现:compress_net.py。

顺便说一句，你通常需要微调压缩模型以恢复准确性(或者以更复杂的方式压缩，例如参见"加速非常深的卷积网络用于分类和检测"，Zhang等人)。

另外，对我来说scipy.linalg.svd比numpy的svd更快。

相关内容

最新更新

热门标签：