小贝子编程

python PyPDF2-在字符串打印pdf文件中的文本时，打印特殊字符

本文关键字：打印文本特殊字符文件 PyPDF2- 字符串 pdf python python python-3.x pdf file-handling pypdf
更新时间 : 2023-09-17
英文 : python PyPDF2 - Special characters are printing while tring to print text from pdf file?

我正试图使用PyPDF2模块打印pdf文件中的文本，但打印出了一些特殊字符
已经尝试过这个解决方案，但似乎不起作用
代码

import PyPDF2
obj = open('/home/sarthak/Documents/UNIT-4.pdf','rb')
pdfReader = PyPDF2.PdfFileReader(obj)
print(pdfReader.numPages)   #printing No. of pages
pageObj = pdfReader.getPage(0)
print(pageObj.extractText().encode('ascii','ignore'))    #also used 'utf-8' but doesn't work either
obj.close()

输出

17
b'nnnn!#$nnnnnnnnnnn  nn"%$nnn"#nnn $nnn'())(*+, -$&nnnnn $&-n $n'

对于删除/n，u可以在textacy中传递结果。

import textacy
data=textacy.preprocess.remove_punct(section, marks='n'))
print(data)

其中section是提取的数据

用于安装textacypip install textacy

python PyPDF2-在字符串打印pdf文件中的文本时，打印特殊字符

相关内容

最新更新

热门标签：