如何在PDF中附加多个文件?



我有一个对象列表:List = ['Doc1.xlsx','Doc2.csv','Doc3.pdf']和它们的名称列表:List1 = ['Doc1_name.xlsx','Doc2_name.csv','Doc3_name.pdf']。 我需要将它们附加到现有的 PDF 中。我尝试使用下面的代码,它仅在我有一个附加时才有效。现在我正在尝试迭代附件以附加所有附件,但在Final.pdf中将仅附加最后一个对象'Doc3.pdf'

fileReader = PdfFileReader('Existing_pdf.pdf', 'rb')
fileWriter = PdfFileWriter()
fileWriter = appendPagesFromReader(fileReader)
for j in range(1, len(List)):
fileWriter.addAtachment(List1[j],List[j])
with open('Final.pdf', 'wb') as output_pdf:
fileWriter.write(output_pdf)

在我看来,addAttachment-Method 完全取代了当前的附件。

从 PyPDF2 Github 中的 pdf.py:

def addAttachment(self, fname, fdata):
file_entry = DecodedStreamObject()
file_entry.setData(fdata)
file_entry.update({
NameObject("/Type"): NameObject("/EmbeddedFile")
})
efEntry = DictionaryObject()
efEntry.update({ NameObject("/F"):file_entry })
filespec = DictionaryObject()
filespec.update({
NameObject("/Type"): NameObject("/Filespec"),
NameObject("/F"): createStringObject(fname),  # Perhaps also try TextStringObject
NameObject("/EF"): efEntry
})
embeddedFilesNamesDictionary = DictionaryObject()
embeddedFilesNamesDictionary.update({
NameObject("/Names"): ArrayObject([createStringObject(fname), filespec])
})
embeddedFilesDictionary = DictionaryObject()
embeddedFilesDictionary.update({
NameObject("/EmbeddedFiles"): embeddedFilesNamesDictionary
})
# Update the root
self._root_object.update({
NameObject("/Names"): embeddedFilesDictionary
})

我相信的地方

self._root_object.update({
NameObject("/Names"): embeddedFilesDictionary
})

替换附件,而不是添加附件。

编辑:这个脚本为我附加了两个.txt文件。 它使用上述addAttachment方法,我略微调整了该方法以启用附加多个文件。

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import DecodedStreamObject, NameObject, DictionaryObject, createStringObject, ArrayObject

def appendAttachment(myPdfFileWriterObj, fname, fdata):
# The entry for the file
file_entry = DecodedStreamObject()
file_entry.setData(fdata)
file_entry.update({NameObject("/Type"): NameObject("/EmbeddedFile")})
# The Filespec entry
efEntry = DictionaryObject()
efEntry.update({ NameObject("/F"):file_entry })
filespec = DictionaryObject()
filespec.update({NameObject("/Type"): NameObject("/Filespec"),NameObject("/F"): createStringObject(fname),NameObject("/EF"): efEntry})
if "/Names" not in myPdfFileWriterObj._root_object.keys():
# No files attached yet. Create the entry for the root, as it needs a reference to the Filespec
embeddedFilesNamesDictionary = DictionaryObject()
embeddedFilesNamesDictionary.update({NameObject("/Names"): ArrayObject([createStringObject(fname), filespec])})
embeddedFilesDictionary = DictionaryObject()
embeddedFilesDictionary.update({NameObject("/EmbeddedFiles"): embeddedFilesNamesDictionary})
myPdfFileWriterObj._root_object.update({NameObject("/Names"): embeddedFilesDictionary})
else:
# There are files already attached. Append the new file.
myPdfFileWriterObj._root_object["/Names"]["/EmbeddedFiles"]["/Names"].append(createStringObject(fname))
myPdfFileWriterObj._root_object["/Names"]["/EmbeddedFiles"]["/Names"].append(filespec)

fr = PdfFileReader('dummy.pdf','rb')
fw = PdfFileWriter()
fw.appendPagesFromReader(fr)
my_attach_files = ['test.txt','test2.txt']
for my_test in my_attach_files:
with open(my_test, 'rb') as my_test_attachment:
my_test_data = my_test_attachment.read()
appendAttachment(fw, my_test, my_test_data)
with open('dummy_new.pdf','wb') as file:
fw.write(file)

希望这对你有用。

免责声明:我是borb的作者,本答案中使用的库

borb中,Document类有一个方法add_embedded_file它接受文件名(将显示在PDF查看器中(和字节。

此简短片段显示了如何将嵌入的文件添加到现有 PDF:

from borb.pdf import Document
from borb.pdf import PDF
import typing

doc: typing.Optional[Document] = None
with open("input.pdf", "rb") as fh:
doc = PDF.loads(fh)
# The next line adds an embedded file to the PDF.
# In order to keep this example short, I've used an inline byte string
# but you can of course read a file, and use those bytes
doc.add_embedded_file("name.json", b"{}")
# store
with open("output.pdf", "wb") as fh:
PDF.dumps(fh, doc)

相关内容

  • 没有找到相关文章

最新更新