我遇到了一些问题,可以在CentOS上安装Python 3.6(Anaconda 5.1.0)中的pdftotext
。
首先一些快速笔记:
- 我在VirtualBox上使用CentOS 6.7
- 我知道它 can 可以工作,因为我的IT组已安装在我们的服务器上。注意:我发现我们的服务器 did 已安装了C 包装器,我正在尝试弄清楚如何获得。
- 我正在尝试获取现有的应用程序,因此我目前不是在寻找
pdftotext
的替代方案。
我遵循GitHub repo的说明,已经尝试了此步骤:
fedora,红帽和朋友:
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config
,但问题似乎围绕着流行音乐-CPP开关。我在yum search poppler
中没有看到该软件包:
============================= N/S Matched: poppler =============================
poppler-devel.i686 : Libraries and headers for poppler
poppler-devel.x86_64 : Libraries and headers for poppler
poppler-glib.i686 : Glib wrapper for poppler
poppler-glib.x86_64 : Glib wrapper for poppler
poppler-qt.i686 : Qt3 wrapper for poppler
poppler-qt.x86_64 : Qt3 wrapper for poppler
poppler-qt4.i686 : Qt4 wrapper for poppler
poppler-qt4.x86_64 : Qt4 wrapper for poppler
poppler.i686 : PDF rendering library
poppler.x86_64 : PDF rendering library
poppler-data.noarch : Encoding files
poppler-glib-devel.i686 : Development files for glib wrapper
poppler-glib-devel.x86_64 : Development files for glib wrapper
poppler-qt-devel.i686 : Development files for Qt3 wrapper
poppler-qt-devel.x86_64 : Development files for Qt3 wrapper
poppler-qt4-devel.i686 : Development files for Qt4 wrapper
poppler-qt4-devel.x86_64 : Development files for Qt4 wrapper
poppler-utils.x86_64 : Command line utilities for converting PDF files
我的IT组给了我他们尝试的内容的说明,我尝试安装poppler-devel
和poppler-glib
。但是每次尝试pip install pdftotext
时,我都会收到以下输出:
[root@localhost stack]# pip install pdftotext
Collecting pdftotext
Using cached https://files.pythonhosted.org/packages/21/35/60094dbadd9de2035873390b1cac25e01da605844eba6a07a53a82fa4adc/pdftotext-2.1.1.tar.gz
Building wheels for collected packages: pdftotext
Building wheel for pdftotext (setup.py) ... error
Complete output from command /root/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-1mu2f1n2/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('rn', 'n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-khm9zova --python-tag cp36:
/root/anaconda3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
warnings.warn(msg)
running bdist_wheel
running build
running build_ext
building 'pdftotext' extension
creating build
creating build/temp.linux-x86_64-3.6
gcc -pthread -B /root/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPOPPLER_CPP_AT_LEAST_0_30_0=0 -I/root/anaconda3/include/python3.6m -c pdftotext.cpp -o build/temp.linux-x86_64-3.6/pdftotext.o -Wall
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
pdftotext.cpp:3:42: error: poppler/cpp/poppler-document.h: No such file or directory
pdftotext.cpp:4:40: error: poppler/cpp/poppler-global.h: No such file or directory
pdftotext.cpp:5:38: error: poppler/cpp/poppler-page.h: No such file or directory
pdftotext.cpp:20: error: ‘poppler’ has not been declared
pdftotext.cpp:20: error: ISO C++ forbids declaration of ‘document’ with no type
pdftotext.cpp:20: error: expected ‘;’ before ‘*’ token
pdftotext.cpp: In function ‘void PDF_clear(PDF*)’:
pdftotext.cpp:26: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp:27: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘int PDF_create_doc(PDF*)’:
pdftotext.cpp:66: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp:66: error: ‘poppler’ has not been declared
pdftotext.cpp:67: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘int PDF_unlock(PDF*, char*)’:
pdftotext.cpp:75: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘int PDF_init(PDF*, PyObject*, PyObject*)’:
pdftotext.cpp:105: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘PyObject* PDF_read_page(PDF*, int)’:
pdftotext.cpp:119: error: ‘poppler’ has not been declared
pdftotext.cpp:119: error: expected initializer before ‘*’ token
pdftotext.cpp:120: error: ‘poppler’ has not been declared
pdftotext.cpp:120: error: expected ‘;’ before ‘layout_mode’
pdftotext.cpp:123: error: ‘page’ was not declared in this scope
pdftotext.cpp:123: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp:129: error: ‘poppler’ has not been declared
pdftotext.cpp:129: error: expected initializer before ‘rect’
pdftotext.cpp:130: error: ‘rect’ was not declared in this scope
pdftotext.cpp:133: error: ‘layout_mode’ was not declared in this scope
pdftotext.cpp:133: error: ‘poppler’ has not been declared
pdftotext.cpp:135: error: ‘poppler’ has not been declared
pdftotext.cpp:137: error: ‘poppler’ has not been declared
pdftotext.cpp:138: error: type ‘<type error>’ argument given to ‘delete’, expected pointer
error: command 'gcc' failed with exit status 1
----------------------------------------
Failed building wheel for pdftotext
Running setup.py clean for pdftotext
Failed to build pdftotext
Installing collected packages: pdftotext
Running setup.py install for pdftotext ... error
Complete output from command /root/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-1mu2f1n2/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('rn', 'n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ghuhvuhl/install-record.txt --single-version-externally-managed --compile:
/root/anaconda3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
warnings.warn(msg)
running install
running build
running build_ext
building 'pdftotext' extension
creating build
creating build/temp.linux-x86_64-3.6
gcc -pthread -B /root/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPOPPLER_CPP_AT_LEAST_0_30_0=0 -I/root/anaconda3/include/python3.6m -c pdftotext.cpp -o build/temp.linux-x86_64-3.6/pdftotext.o -Wall
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
pdftotext.cpp:3:42: error: poppler/cpp/poppler-document.h: No such file or directory
pdftotext.cpp:4:40: error: poppler/cpp/poppler-global.h: No such file or directory
pdftotext.cpp:5:38: error: poppler/cpp/poppler-page.h: No such file or directory
pdftotext.cpp:20: error: ‘poppler’ has not been declared
pdftotext.cpp:20: error: ISO C++ forbids declaration of ‘document’ with no type
pdftotext.cpp:20: error: expected ‘;’ before ‘*’ token
pdftotext.cpp: In function ‘void PDF_clear(PDF*)’:
pdftotext.cpp:26: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp:27: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘int PDF_create_doc(PDF*)’:
pdftotext.cpp:66: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp:66: error: ‘poppler’ has not been declared
pdftotext.cpp:67: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘int PDF_unlock(PDF*, char*)’:
pdftotext.cpp:75: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘int PDF_init(PDF*, PyObject*, PyObject*)’:
pdftotext.cpp:105: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp: In function ‘PyObject* PDF_read_page(PDF*, int)’:
pdftotext.cpp:119: error: ‘poppler’ has not been declared
pdftotext.cpp:119: error: expected initializer before ‘*’ token
pdftotext.cpp:120: error: ‘poppler’ has not been declared
pdftotext.cpp:120: error: expected ‘;’ before ‘layout_mode’
pdftotext.cpp:123: error: ‘page’ was not declared in this scope
pdftotext.cpp:123: error: ‘struct PDF’ has no member named ‘doc’
pdftotext.cpp:129: error: ‘poppler’ has not been declared
pdftotext.cpp:129: error: expected initializer before ‘rect’
pdftotext.cpp:130: error: ‘rect’ was not declared in this scope
pdftotext.cpp:133: error: ‘layout_mode’ was not declared in this scope
pdftotext.cpp:133: error: ‘poppler’ has not been declared
pdftotext.cpp:135: error: ‘poppler’ has not been declared
pdftotext.cpp:137: error: ‘poppler’ has not been declared
pdftotext.cpp:138: error: type ‘<type error>’ argument given to ‘delete’, expected pointer
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/root/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-1mu2f1n2/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('rn', 'n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ghuhvuhl/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-1mu2f1n2/pdftotext/
我认为这里的问题是它正在寻找C 编译的文件,而我只能得到glib?
我可以看什么?
pdftotext 应该在 poppler-utils 中,所以尝试yum install poppler-utils
编辑: hmm。有一个称为 pypoppler 的软件包可用于EPEL存储库中的CentOS 6,该软件包将其描述为" Poppler PDF渲染库的Python绑定"。我发现没有迹象表明它包括 poppler/cpp/{nothits} ,但是您可以尝试一下。(您可能需要首先安装 pycairo 。)
失败,您可以尝试安装 pdftotext (例如pip install pdftotext==1.0.0
)的早期版本以找到与Centos 6兼容的6.帮助。
我不认为您有兴趣升级到Centos 7?
实际上有一个适当的Centos 7软件包。安装Poppler-Devel软件包还不够,因为它不包括所需的CPP标头文件,您只需要安装Poppler-CPP和Poppler-CPP-Devel软件包:
yum安装poppler-cpp poppler-cpp-devel
之后,您可以pip/pip3安装pdftotext软件包,而无需自定义编译和设置环境路径变量。
我找到了解决方案。通过按照此链接安装libpoppler-cpp
的说明,我能够成功安装pdftotext
。
按照此存储库中的说明:
centos
在CENTOS上,
libpoppler-cpp
库不包含在系统中,因此我们需要从源构建。请注意,最新版本的Poppler需要C 11,这在CentOS上不可用,因此我们构建了libpoppler的较旧版本。# Build dependencies yum install wget xz libjpeg-devel openjpeg2-devel # Download and extract wget https://poppler.freedesktop.org/poppler-0.47.0.tar.xz tar -Jxvf poppler-0.47.0.tar.xz cd poppler-0.47.0 # Build and install ./configure make sudo make install
默认情况下,库将安装在
/usr/local/lib
和/usr/local/include
中。在CentOS上,这不是默认搜索路径,因此我们需要将PKG_CONFIG_PATH
和LD_LIBRARY_PATH
设置为正确的目录:export LD_LIBRARY_PATH="/usr/local/lib" export PKG_CONFIG_PATH="/usr/local/lib/pkgconfig"