Alfresco上使用pypdfocr的OCR文件



我无法使用pypdfocr在Alfresco上对文件进行OCR。

大家好,我从Alfresco开始,我在Alfreco中配置pypdfocr并使用它时遇到了一些困难。

我在Ubunto 18.04.5 LTS上安装了Alfresco,使用的是:

wget https://download.alfresco.com/release/community/201707-build-00028/alfresco-community-installer-201707-linux-x64.bin

我已经完成了所有必要的配置,在各自的文件夹中添加了repo.jar和share.jar文件:

/opt/alfresco-community/modules/platform/simple-ocr-repo-2.3.1.jar
/opt/alfresco-community/modules/share/simple-ocr-share-2.3.1.jar

我在alfresco-global.properties:中添加了属性

# PYPDFOCR
ocr.command = /opt/alfresco-community/scripts/ocr.sh
ocr.output.verbose = true
ocr.output.file.prefix.command =
ocr.extra.commands = -v -l por
ocr.server.os = linux

我创建了上面代码中调用的脚本:

#!/usr/bin/env bash
# set -o xtrace # Uncomment for debugging / troubleshooting
array = ("$ @")
unset "array [$ {# array [@]} - 1]"
/usr/local/bin/pypdfocr "$ {array [@]}"

我安装了这样的依赖项:apt-install gcc libjpeg dev minizip zlib1g dev python dev

然而,当我尝试在Alfresco内执行OCR时,我在/tomcat/logs/中收到以下消息:

catalina.out

如有任何帮助,将不胜感激

****我试图通过安装更多的依赖项来解决这个问题,但没有成功:

apt-get install wget gcc gcc-c ++ make autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel ocaml ImageMagick ImageMagick-devel

我收到以下消息:

E: Unable to locate package gcc-c +
E: Couldn't find any package by regex 'gcc-c +'
E: Unable to locate package libjpeg-devel
E: Unable to locate package libpng-devel
E: Unable to locate package libtiff-devel
E: Unable to locate package zlib-devel
E: Unable to locate package ImageMagick
E: Unable to locate package ImageMagick-devel

似乎需要安装ImageMagick和/或poppler utils。

要安装ImageMagick,请执行以下操作:https://www.tutorialspoint.com/how-to-install-imagemagick-on-ubuntu安装poppler utils:sudo apt-get-install-y poppler utils

注意:您需要更多的依赖项才能使这个ocr模块正常工作。具体而言,以下内容:

Tesseract和Leptonica:https://medium.com/@jjagadish.in/install-tessract-3-04-on-centos-7-4573465d8867

以及以下软件包:

epel-release
python-pip
gcc
libjpeg
minizip
zlib
python
ghostscript

一旦安装了pip,就需要安装pypdfocr和pyyaml:

pip install pypdfocr
pip install pyyaml

我建议首先使用一个示例pdf:在命令行中使其工作

/opt/alfresco-community/scripts/ocr.sh -v -l por test.pdf test.pdf

相关内容

  • 没有找到相关文章

最新更新