Tensorflow收敛了,但是在mnist数据集中使用alexnet的训练精度非常低



我写了一个tensoflow程序来尝试使用MNIST数据集的Alexnet,但奇怪的是我的网络收敛得非常快,损失几乎没有改变。并且每批精度很低,低于0.1,如下:

step 0   loss 2.29801 train_accuracy 0.14
step 100 loss 2.30258 train_accuracy 0.07
step 200 loss 2.30258 train_accuracy 0.15
step 300 loss 2.30258 train_accuracy 0.09
step 400 loss 2.30258 train_accuracy 0.08
step 500 loss 2.30258 train_accuracy 0.06
step 600 loss 2.30258 train_accuracy 0.15
step 700 loss 2.30258 train_accuracy 0.16
....


像这样的图像

这是我的代码

import tensorflow as tf
from tensorflow.python import debug as tf_debug
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
data_dir='./minist'
mnist = input_data.read_data_sets(data_dir,one_hot=True)
def conv2d(name, x, ws, bs, strides=1):
    w = tf.Variable(tf.truncated_normal(ws,stddev=0.01))
    b = tf.Variable(tf.constant(0.,shape=bs))
    x = tf.nn.conv2d(x, w, strides=[1,strides,strides,1], padding='SAME')
    x = tf.nn.bias_add(x,b)
    return tf.nn.relu(x, name=name)
def maxpool2d(name, x, k=2):
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                      padding='SAME', name=name)
def fc_op(name, x, n_out):
    n_in = x.get_shape()[-1].value
    w = tf.Variable(tf.truncated_normal([n_in,n_out],stddev=0.01))
    b = tf.Variable(tf.constant(0.,shape=[n_out]))
    x = tf.matmul(x, w)
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x, name=name)
def alex_net(x,num_classes):
    conv1 = conv2d('conv1', x, [11,11,1,96] , [96], strides=4)
    pool1 = maxpool2d('pool1', conv1)
    conv2 = conv2d('conv2', pool1, [5,5,96,256] , [256])
    pool2 = maxpool2d('pool2', conv2)
    conv3 = conv2d('conv3', pool2, [3,3,256,384] , [384])
    conv4 = conv2d('conv4', conv3, [3,3,384,384] , [384])
    conv5 = conv2d('conv5', conv4, [3,3,384,256] , [256])
    pool5 = maxpool2d('pool5', conv5)
    shp = pool5.get_shape()
    flattened_shape = shp[1].value*shp[2].value*shp[3].value
    resh = tf.reshape(pool5, shape=[-1,flattened_shape], name='resh')
    fc1 = fc_op('fc1', resh, 4096)
    fc2 = fc_op('fc2', fc1, 4096)
    fc3 = fc_op('fc3', fc2, num_classes)
    return fc3
# ############################ arguments setting
learning_rate = 0.01
train_steps= 8000
num_classes = 10  
x = tf.placeholder(shape=[None, 784],dtype=tf.float32)
x_image = tf.reshape(x, [-1, 28, 28, 1])
y=tf.placeholder(shape=[None,10],dtype=tf.float32)
output=alex_net(x_image,num_classes)  
# ################################### trian
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output,labels=y))
train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# ###################################  inference
correct_pred = tf.equal(tf.argmax(output,1),tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
for i in range(train_steps):
    xs, ys = mnist.train.next_batch(100)
    sess.run(train_op,feed_dict={x:xs,y:ys})
    if i%100==0:
        loss,train_accuracy = sess.run([cost,accuracy],feed_dict={x:xs,y:ys})
        print('step',i,'loss',loss,'train_accuracy',train_accuracy)

实际上我不仅尝试了MNIST,还尝试了cifar-10,并且遇到了同样的问题

问题是您的网络有很多层,而且每层的深度(过滤器数量)非常高。此外,您正在从头开始训练网络。而且您的MNIST(60000张图像)数据集很少。此外,每个 MNIST 图像只有 28x28x1 大小。

我可以建议你的几种选择是重新训练一个预先训练的模型,即进行迁移学习。看看这个 AlexNet 权重文件。体系结构与您在代码中所做的略有不同。这种实现方式的好处是,它将抵消您拥有的很少的数据。

其他更好的选择是减少层数和每层中的过滤器数量。这样,您将能够从头开始训练模型,而且速度也非常快。(假设您没有 GPU)。看看 LeNet-5 架构。

希望答案对您有所帮助。

最新更新