仅在Tesseract中添加用户字

我在我的Android应用程序中使用Tesseract。我定义了"用户字"文件，并为OCR添加了BOLD行以考虑用户字文件。

String language = "deu";
datapath = getFilesDir()+ "/tesseract/";
Tess = new TessBaseAPI();
checkFile(new File(datapath + "tessdata/"));
**Tess.setVariable("user_words_suffix","deu.user-words");**
Tess.init(datapath, language);

我没有定义用户模式文件，因为我的图像中没有任何特定模式。我只需复制tessdata文件夹中的dure.user字文件的UTF-8 TXT文件。这足以用于OCR配置吗？还是我应该解开due_traindata，然后将此文件添加到due_traindata，然后打包它？如果是，您可以给我一些暗示如何做到这一点。

您不需要在代码中指定语言前缀：

Tess.setVariable("user_words_suffix", "user-words");

确保文件的前缀匹配指定的语言代码 - 即deu.user-words。

https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.aschttps://github.com/tesseract-ocr/tesseract/wiki/controlparams

相关内容

最新更新

热门标签：