我无法在我的新mac M1中导入pdftext。我采取的步骤是:
-
安装python 3.10
-
安装命令行开发工具
-
来自终端的
pip3 install pdftotext
-
打开IDLE,键入
import pdftotext
-
我得到这个错误:
追踪(最近一次通话(:文件"<pyshell#9>";,第1行,在导入pdftotextImportError:dlopen(/Library/Frameworks/Python.framework/Versions/310/lib/python3.10/site-packages/pdftotext.cpython-310-darwin.so,0x0002(:在平面命名空间'_ZN7poppler24set_debug_error_functionEPFvRKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEPvES9'中找不到符号
我已经花了几个小时搜索此错误消息。
有什么建议吗?
附言:我已经尝试了其他几个pdf->文本包,但他们没有阅读完整的pdf。出于某种奇怪的原因,我需要阅读的pdf非常复杂,而且许多包没有完全阅读它们。pdftotext确实如此。所以我需要的是帮助使这个pdftotext工作。
我认为pdftotext
库不好。使用PyPDF2
更好,这里是的例子
import PyPDF2
#create file object variable
#opening method will be rb
pdffileobj=open('1.pdf','rb')
#create reader variable that will read the pdffileobj
pdfreader=PyPDF2.PdfFileReader(pdffileobj)
#This will store the number of pages of this pdf file
x=pdfreader.numPages
#create a variable that will select the selected number of pages
pageobj=pdfreader.getPage(x+1)
#(x+1) because python indentation starts with 0.
#create text variable which will store all text datafrom pdf file
text=pageobj.extractText()
#save the extracted data from pdf to a txt file
#we will use file handling here
#dont forget to put r before you put the file path
#go to the file location copy the path by right clicking on the file
#click properties and copy the location path and paste it here.
#put "\your_txtfilename"
file1=open(r"C:UsersSIDDHIAppDataLocalProgramsPythonPython38\1.txt","a")
file1.writelines(text)