如何使用 Keras OCR 示例

我找到了examples/image_ocr.py，这似乎适用于OCR。因此，应该可以给模型一个图像并接收文本。但是，我不知道该怎么做。如何为模型提供新图像？哪种预处理是必要的？

我做了什么

安装实例：

安装cairocffi：sudo apt-get install python-cairocffi
安装editdistance：sudo -H pip install editdistance
更改train以返回模型并保存已训练的模型。
运行脚本以训练模型。

现在我有一个model.h5.下一步是什么？

请参阅 https://github.com/MartinThoma/algorithms/tree/master/ML/ocr/keras 了解我当前的代码。我知道如何加载模型(见下文)，这似乎有效。问题是我不知道如何将带有文本的新图像扫描提供给模型。

我尝试了什么

#!/usr/bin/env python
from keras import backend as K
import keras
from keras.models import load_model
import os
from image_ocr import ctc_lambda_func, create_model, TextImageGenerator
from keras.layers import Lambda
from keras.utils.data_utils import get_file
import scipy.ndimage
import numpy
img_h = 64
img_w = 512
pool_size = 2
words_per_epoch = 16000
val_split = 0.2
val_words = int(words_per_epoch * (val_split))
if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
fdir = os.path.dirname(get_file('wordlists.tgz',
origin='http://www.mythic-ai.com/datasets/wordlists.tgz', untar=True))
img_gen = TextImageGenerator(monogram_file=os.path.join(fdir, 'wordlist_mono_clean.txt'),
bigram_file=os.path.join(fdir, 'wordlist_bi_clean.txt'),
minibatch_size=32,
img_w=img_w,
img_h=img_h,
downsample_factor=(pool_size ** 2),
val_split=words_per_epoch - val_words
)
print("Input shape: {}".format(input_shape))
model, _, _ = create_model(input_shape, img_gen, pool_size, img_w, img_h)
model.load_weights("my_model.h5")
x = scipy.ndimage.imread('example.png', mode='L').transpose()
x = x.reshape(x.shape + (1,))
# Does not work
print(model.predict(x))

这给了

2017-07-05 22:07:58.695665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
File "eval_example.py", line 45, in <module>
print(model.predict(x))
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1567, in predict
check_batch_axis=False)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 106, in _standardize_input_data
'Found: array with shape ' + str(data.shape))
ValueError: The model expects 4 arrays, but only received one array. Found: array with shape (512, 64, 1)

好吧，我将尝试回答您在这里提出的所有问题：

正如 OCR 代码中所注释的，Keras 不支持具有多个参数的损失，因此它计算了 lambda 层中的 NN 损失。在这种情况下，这意味着什么？

神经网络可能看起来令人困惑，因为它使用4个输入([input_data, labels, input_length, label_length])和loss_out作为输出。除了input_data之外，其他所有信息都是仅用于计算损失的信息，这意味着它仅用于培训。我们希望像原始代码的第 468 行这样：

Model(inputs=input_data, outputs=y_pred).summary()

这意味着"我有一个图像作为输入，请告诉我这里写了什么"。那么如何实现呢？

1)保持原训练代码不变，正常进行训练;

2)训练后，将此模型Model(inputs=input_data, outputs=y_pred)保存在.h5文件中，以加载到您想要的任何位置;

3)做预测：如果你看一下代码，输入的图像被反转和翻译，所以你可以用这段代码来简化：

from scipy.misc import imread, imresize
#use width and height from your neural network here.
def load_for_nn(img_file):
image = imread(img_file, flatten=True)
image = imresize(image,(height, width))
image = image.T
images = np.ones((1,width,height)) #change 1 to any number of images you want to predict, here I just want to predict one
images[0] = image
images = images[:,:,:,np.newaxis]
images /= 255
return images

加载图像后，让我们进行预测：

def predict_image(image_path): #insert the path of your image 
image = load_for_nn(image_path) #load from the snippet code
raw_word = model.predict(image) #do the prediction with the neural network
final_word = decode_output(raw_word)[0] #the output of our neural network is only numbers. Use decode_output from image_ocr.py to get the desirable string.
return final_word

这应该足够了。根据我的经验，训练中使用的图像不足以做出良好的预测，如有必要，我将使用其他数据集发布代码，以改进我的结果。

回答相关问题：

什么是CTC？联结主义的时间分类？

它是一种用于改进序列分类的技术。原始论文证明它改善了发现音频中所说的内容的结果。在本例中，它是一个字符序列。解释有点技巧，但你可以在这里找到一个很好的解释。

是否有能够可靠地检测文档旋转的算法？

我不确定，但你可以看看神经网络中的注意力机制。我现在没有任何好的链接，但我知道可能是这样。

是否有算法可以可靠地检测行/文本块/表格/图像(因此进行合理的分割)？我想具有平滑和逐行直方图的边缘检测已经相当有效了？

OpenCV实现了最大稳定极值区域(称为MSER)。我真的很喜欢这个算法的结果，它很快，当我需要的时候对我来说已经足够好了。

正如我之前所说，我将很快发布代码。当我这样做时，我会使用存储库编辑问题，但我相信这里的信息足以让示例运行。

现在我有一个模型.h5。下一步是什么？

首先，我应该评论一下，model.h5包含网络的权重，如果您还希望保存网络的体系结构，则应将其保存为json，如以下示例所示：

model_json = model_json = model.to_json()
with open("model_arch.json", "w") as json_file:
json_file.write(model_json)

现在，获得模型及其权重后，可以通过执行以下操作按需加载它们：

json_file = open('model_arch.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
# if you already have a loaded model and dont need to save start from here
loaded_model.load_weights("model.h5")    
# compile loaded model with certain specifications
sgd = SGD(lr=0.01)
loaded_model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])

然后，通过该loaded_module您可以继续预测某些输入的分类，如下所示：

prediction = loaded_model.predict(some_input, batch_size=20, verbose=0)

这将返回该输入的分类。

关于附带问题：

CTC似乎是他们在您引用的论文中定义的一个术语，从中提取说：

在下文中，我们提到标记un- 分段数据序列为时间分类 (Kadous，2002)，以及我们使用RNN来实现这个目的自居联结主义时间分类 (反恐委员会)。

为了补偿文档、图像或类似内容的旋转，您可以通过应用此类转换从当前文档生成更多数据(看看这篇解释方法的博客文章)，或者您可以使用卷积神经网络方法，这实际上也是您正在使用的Keras示例的作用，正如我们从那个 git 中看到的：

此示例使用卷积堆栈，后跟循环堆栈和CTC对数损失功能，用于执行光学字符识别生成的文本图像。

您可以查看与您正在做的事情相关的本教程，以及它们还解释了有关卷积神经网络的更多信息。

嗯，这是一个广泛的问题，但要检测线，你可以使用霍夫线变换，或者Canny Edge Detection可能是不错的选择。

编辑：您收到的错误是因为需要更多参数而不是 1，从 keras 文档中我们可以看到：

predict(self, x, batch_size=32, verbose=0)

引发 ValueError：如果提供的输入数据与模型的期望不匹配，或者有状态模型收到的样本数量不是批大小的倍数。

在这里，您创建了一个需要 4 个输入的模型：

model = Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)

另一方面，您的预测尝试只是加载图像。
因此，消息：该模型需要 4 个数组，但只收到一个数组

从代码中，必要的输入是：

input_data = Input(name='the_input', shape=input_shape, dtype='float32')
labels = Input(name='the_labels', shape=[img_gen.absolute_max_string_len],dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')

原始代码和您的训练工作，因为他们使用的是TextImageGenerator。该生成器关心为您提供模型的四个必要输入。

因此，您要做的是使用生成器进行预测。由于您有使用生成器进行训练的fit_generator()方法，因此您也有使用生成器进行预测的 predict_generator() 方法。

现在，为了获得完整的答案和解决方案，我必须研究您的生成器并查看它是如何工作的(这需要我一些时间)。但是现在你知道要做什么了，你可能可以弄清楚了。

您可以按原样使用生成器，并预测可能大量的数据，也可以尝试复制一个生成器，该生成器将只生成一个或几个具有必要标签、长度和标签长度的图像。

或者，如果可能的话，只需手动创建剩余的 3 个数组，但要确保它们与生成器输出具有相同的形状(第一个除外，即批量大小)。

但是，您必须断言的一件事是：有 4 个数组，其形状与生成器输出相同，除了第一维。

嗨，你可以查看我的 github 存储库。您需要针对要执行 ocr 的图像类型训练模型。

# USE GOOGLE COLAB
import matplotlib.pyplot as plt
import keras_ocr
images = [keras_ocr.tools.read("/content/sample_data/IMG_20200224_113657.jpg")] #Image path
pipeline = keras_ocr.pipeline.Pipeline()
prediction = pipeline.recognize(images)
x_max = 0
temp_str = ""
myfile = open("/content/sample_data/my_file.txt", "a+")#Text File Path to save text
for i in prediction[0]:
x_max_local = i[1][:, 0].max()
if x_max_local > x_max:
x_max = x_max_local
temp_str = temp_str + " " + i[0].ljust(15)
else:
x_max = 0
temp_str = temp_str + "n"
myfile.write(temp_str)
print(temp_str)
temp_str = ""
myfile.close()

我做了什么

相关附带问题

我尝试了什么

相关内容

最新更新

热门标签：