我在Ubuntu 20.04&我正在尝试一个C++代码来OCR将图像转换为可搜索的PDF。
我的代码比官方网站上提供的C++API示例代码有所修改:
/home/test/Desktop/Example2/testexample2.cpp:
#include <leptonica/allheaders.h>
#include <tesseract/baseapi.h>
#include <tesseract/renderer.h>
int main()
{
//const char* input_image = "/usr/src/tesseract-oc/testing/phototest.tif";
//const char* output_base = "my_first_tesseract_pdf";
//const char* datapath = "/Projects/OCR/tesseract/tessdata";
const char* input_image = "001.jpg";
const char* output_base = "001";
const char* datapath = ".";
int timeout_ms = 5000;
const char* retry_config = nullptr;
bool textonly = false;
int jpg_quality = 92;
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(datapath, "eng")) {
fprintf(stderr, "Could not initialize tesseract.n");
exit(1);
}
/*
tesseract::TessPDFRenderer *renderer = new tesseract::TessPDFRenderer(
output_base, api->GetDatapath(), textonly, jpg_quality);
*/
tesseract::TessPDFRenderer *renderer = new tesseract::TessPDFRenderer(
output_base, api->GetDatapath(), textonly);
bool succeed = api->ProcessPages(input_image, retry_config, timeout_ms, renderer);
if (!succeed) {
fprintf(stderr, "Error during processing.n");
return EXIT_FAILURE;
}
api->End();
return EXIT_SUCCESS;
}
我也跟着https://stackoverflow.com/a/59382664如下所示:
cd /home/test/Desktop/Example2
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
wget https://github.com/tesseract-ocr/tesseract/blob/master/tessdata/pdf.ttf
export TESSDATA_PREFIX=$(pwd)
gedit config
(In the config file, entered the contents:
tessedit_create_pdf 1 Write .pdf output file
tessedit_create txt 1 Write .txt output file
)
g++ testexample2.cpp -o testexample2 -ltesseract
./testexample2
但在执行时,它会显示以下错误:
Warning: Invalid resolution 0 dpi. Using 70 instead.
Error during processing.
ObjectCache(0x7f1b096669c0)::~ObjectCache(): WARNING! LEAK! object 0x55af5c5241a0 still has count 1 (id /home/test/Desktop/Example2/eng.traineddatapunc-dawg)
ObjectCache(0x7f1b096669c0)::~ObjectCache(): WARNING! LEAK! object 0x55af5c506770 still has count 1 (id /home/test/Desktop/Example2/eng.traineddataword-dawg)
ObjectCache(0x7f1b096669c0)::~ObjectCache(): WARNING! LEAK! object 0x55af5c9a4a70 still has count 1 (id /home/test/Desktop/Example2/eng.traineddatanumber-dawg)
ObjectCache(0x7f1b096669c0)::~ObjectCache(): WARNING! LEAK! object 0x55af5c9a4980 still has count 1 (id /home/test/Desktop/Example2/eng.traineddatabigram-dawg)
ObjectCache(0x7f1b096669c0)::~ObjectCache(): WARNING! LEAK! object 0x55af5d7d5170 still has count 1 (id /home/test/Desktop/Example2/eng.traineddatafreq-dawg)
我的目录结构是:
示例2
|------->001.jpg
|------->配置
|------->eng.traineddata
|------->pdf.ttf
|------->测试示例2
|------->testexample2.cpp
我在多个来源上搜索过这一点,但找不到任何修复方法。
此外,我想知道是否有什么方法可以使用C++编译从这个代码+libtesseract构建二进制文件,使我的二进制文件成为一个独立的可移植二进制文件,在其他Ubuntu系统上运行它不需要重新安装tesseract库&它们的依赖
您必须为类释放使用动态内存"api";
用途:
... you code...
if (renderer) delete renderer;
if (api) delete api;
tesseract API示例是使用tesseract功能的示例,不包括您选择的编程语言的所有细节(在您的示例中为c++(。
只需查看代码,即使不尝试:动态分配内存2倍,但没有解除分配。请尝试解决这些问题。