使用python和tensorflow从图像中识别数字

详细信息：Ubuntu 14.04(LTS)、OpenCV 2.4.13、Spyder 2.3.9(Python 2.7)、Tensorflow r0.10

我想从中认出数字使用Python和Tensorflow的图像(可选OpenCV)。

此外，我想将MNIST数据训练与tensorflow 一起使用

像这样(代码参考本页的视频)，

代码：

import tensorflow as tf
import random
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder("float", [None, 784])
y = tf.placeholder("float", [None, 10])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 1
### modeling ###
activation = tf.nn.softmax(tf.matmul(x, W) + b)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(activation), reduction_indices=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
### training ###
for epoch in range(training_epochs) :
avg_cost = 0
total_batch = int(mnist.train.num_examples/batch_size)
for i in range(total_batch) :
batch_xs, batch_ys =mnist.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
avg_cost += sess.run(cross_entropy, feed_dict = {x: batch_xs, y: batch_ys}) / total_batch
if epoch % display_step == 0 :
print "Epoch : ", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost)
print "Optimization Finished"
### predict number ###
r = random.randint(0, mnist.test.num_examples - 1)
print "Prediction: ", sess.run(tf.argmax(activation,1), {x: mnist.test.images[r:r+1]})
print "Correct Answer: ", sess.run(tf.argmax(mnist.test.labels[r:r+1], 1))

但是，问题是如何制作像一样的numpy数组

代码添加：

mnist.test.images[r:r+1]

[[0。0。0。0。0。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0.501960810.501960810.50196081 0.501960810。0。0。0。0。0。00。0。0。00。0.50196081 1。1。1。1.0.50196081 0.25098041 0。00。0。0。00。0。0。00。0.50196081 1。1。1.1。1。1。0.25098041 0。0。0。00。0。0。00。0.74901962 1。1。1.0.50196081 0.501960810.501960810.74901962 1。1.1.0.74901962 0。0。0。0。0。0。00。0.50196081 1。1。0.74901962 0。0。0。00.50196081 1。1.0.74901962 0。0。0。0。0。00。0.1。1。0.50196081 0。0。0。00。0.25098041 1。1.0.749019620.25098041 0。0。0。00。0。0.74901962 1。1.0.74901962 0。0。0。00。0。0.25098041 1。1.0.74901962 0。0。0。00。0.50196081 1。1.0.74901962 0。0。0。00。0。0。0.25098041 1。1.0.50196081 0。0。0。0。0.501960811。0.25098041 0。0。00。0。0。00。1。0.50196081 0。00。0。0。1.1。0。0。00。0。0。00.25098041 1。1。0。0。0。0。1.1.0.50196081 0。0。0。0。0。0。00.25098041 1。1。1。00。0。0。0.74901962 1。0.50196081 0。0。00。0。0。00.74901962 1。1。0.250980410。0。0。00.50196081 1。1。0。0。0。0。0.25098041 0.74901962 1。1。1.0.74901962 0。0。0。00。0.50196081 1。1.0.74901962 0。0。0。0.25098041 0.50196081 1。1。1。1.0.50196081 0。0。0。0。0。0。0.74901962 1。1。1.0.501960810.50196081 0.74901962 1。1。1。1。0.50196081 0。0。00。0。0。00。0.74901962 1。1。1.1。1。1。1.0.50196081 0。0。0。00。0。0。00。0.25098041 1。1。1。1。0.50196081 0.250980410。0。0。00。0。0。00。0。0。0.501960810.50196081 0.501960810.50 196081 0。0。0。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0。0。00。0.0.]]

当我使用OpenCV来解决这个问题时，我可以制作关于图像的numpy数组，但有点奇怪。(我想把阵列做成一个28x28的矢量)

代码添加：

image = cv2.imread("img_easy.jpg")
resized_image = cv2.resize(image, (28, 28))

[[255 255 255][255 255 255][255 255 255]。。。，[255 255 255][255 255 255][255255 255]]
[[255 255 255][255 255 255][255 255 255]。。。，[255 255 255][255 255 255][255255 255]]
[[255 255 255][255 255 255][255 255 255]。。。，[255 255 255][255 255 255][255255 255]]
[[255 255 255][255 255 255][255 255 255]。。。，[255 255 255][255 255 255][255255 255]]
[[255 255 255][255 255 255][255 255 255]。。。，[255 255 255][255 255 255][255255 255]]
[[255 255 255][255 255 255][255 255 255]。。。，[255 255 255][255 255 255][255255 255]]

然后，我将值('sized_image')放入Tensorflow代码中。像这样，

代码修改：

### predict number ###
print "Prediction: ", sess.run(tf.argmax(activation,1), {x: resized_image})
print "Correct Answer: 9"

因此，这一行出现了错误。

ValueError：无法为具有形状"(？，784)"的张量u'Placeholder_2:0'提供形状(28，28，3)的值

最后，

1) 我想知道如何制作一个可以输入tensorflow代码的数据(可能是numpy数组[784])

2) 你知道使用tensorflow的数字识别示例吗？

我是机器学习的初学者。

请详细告诉我该怎么办。

您使用的图像似乎是RGB，因此是第三维度(28,28,3)。

其中，作为原始MNIST图像是宽度和高度为28的灰度级。这就是为什么你的x占位符的形状是[无，784]，因为28*28=784。

CV2正在以RGB读取图像，您希望它是灰度级的，即(28,28)在进行imread时，你可能会发现使用它很有帮助。

image = cv2.imread("img_easy.jpg", cv2.CV_LOAD_IMAGE_GRAYSCALE)

通过这样做，您的图像应该具有正确的形状(28，28)。

此外，CV2图像值与问题中显示的MNIST图像不在同一范围内。您可能需要规范化图像中的值，使它们处于0-1的范围内。

此外，你可能想使用CNN(稍微先进一点，但应该会给出更好的结果)。请参阅本页上的教程https://www.tensorflow.org/tutorials/了解更多详细信息。

你试过这个吗？我也遇到了同样的问题，这对很有帮助

resized = cv2.resize(image, dsize = (28,28), interpolation = cv2.INTER_CUBIC)

相关内容

最新更新

热门标签：