我有一个对象列表:List = ['Doc1.xlsx','Doc2.csv','Doc3.pdf']
和它们的名称列表:List1 = ['Doc1_name.xlsx','Doc2_name.csv','Doc3_name.pdf']
。 我需要将它们附加到现有的 PDF 中。我尝试使用下面的代码,它仅在我有一个附加时才有效。现在我正在尝试迭代附件以附加所有附件,但在Final.pdf
中将仅附加最后一个对象'Doc3.pdf'
。
fileReader = PdfFileReader('Existing_pdf.pdf', 'rb')
fileWriter = PdfFileWriter()
fileWriter = appendPagesFromReader(fileReader)
for j in range(1, len(List)):
fileWriter.addAtachment(List1[j],List[j])
with open('Final.pdf', 'wb') as output_pdf:
fileWriter.write(output_pdf)
在我看来,addAttachment-Method 完全取代了当前的附件。
从 PyPDF2 Github 中的 pdf.py:
def addAttachment(self, fname, fdata):
file_entry = DecodedStreamObject()
file_entry.setData(fdata)
file_entry.update({
NameObject("/Type"): NameObject("/EmbeddedFile")
})
efEntry = DictionaryObject()
efEntry.update({ NameObject("/F"):file_entry })
filespec = DictionaryObject()
filespec.update({
NameObject("/Type"): NameObject("/Filespec"),
NameObject("/F"): createStringObject(fname), # Perhaps also try TextStringObject
NameObject("/EF"): efEntry
})
embeddedFilesNamesDictionary = DictionaryObject()
embeddedFilesNamesDictionary.update({
NameObject("/Names"): ArrayObject([createStringObject(fname), filespec])
})
embeddedFilesDictionary = DictionaryObject()
embeddedFilesDictionary.update({
NameObject("/EmbeddedFiles"): embeddedFilesNamesDictionary
})
# Update the root
self._root_object.update({
NameObject("/Names"): embeddedFilesDictionary
})
我相信的地方
self._root_object.update({
NameObject("/Names"): embeddedFilesDictionary
})
替换附件,而不是添加附件。
编辑:这个脚本为我附加了两个.txt文件。 它使用上述addAttachment
方法,我略微调整了该方法以启用附加多个文件。
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import DecodedStreamObject, NameObject, DictionaryObject, createStringObject, ArrayObject
def appendAttachment(myPdfFileWriterObj, fname, fdata):
# The entry for the file
file_entry = DecodedStreamObject()
file_entry.setData(fdata)
file_entry.update({NameObject("/Type"): NameObject("/EmbeddedFile")})
# The Filespec entry
efEntry = DictionaryObject()
efEntry.update({ NameObject("/F"):file_entry })
filespec = DictionaryObject()
filespec.update({NameObject("/Type"): NameObject("/Filespec"),NameObject("/F"): createStringObject(fname),NameObject("/EF"): efEntry})
if "/Names" not in myPdfFileWriterObj._root_object.keys():
# No files attached yet. Create the entry for the root, as it needs a reference to the Filespec
embeddedFilesNamesDictionary = DictionaryObject()
embeddedFilesNamesDictionary.update({NameObject("/Names"): ArrayObject([createStringObject(fname), filespec])})
embeddedFilesDictionary = DictionaryObject()
embeddedFilesDictionary.update({NameObject("/EmbeddedFiles"): embeddedFilesNamesDictionary})
myPdfFileWriterObj._root_object.update({NameObject("/Names"): embeddedFilesDictionary})
else:
# There are files already attached. Append the new file.
myPdfFileWriterObj._root_object["/Names"]["/EmbeddedFiles"]["/Names"].append(createStringObject(fname))
myPdfFileWriterObj._root_object["/Names"]["/EmbeddedFiles"]["/Names"].append(filespec)
fr = PdfFileReader('dummy.pdf','rb')
fw = PdfFileWriter()
fw.appendPagesFromReader(fr)
my_attach_files = ['test.txt','test2.txt']
for my_test in my_attach_files:
with open(my_test, 'rb') as my_test_attachment:
my_test_data = my_test_attachment.read()
appendAttachment(fw, my_test, my_test_data)
with open('dummy_new.pdf','wb') as file:
fw.write(file)
希望这对你有用。
免责声明:我是borb
的作者,本答案中使用的库
在borb
中,Document
类有一个方法add_embedded_file
它接受文件名(将显示在PDF查看器中(和字节。
此简短片段显示了如何将嵌入的文件添加到现有 PDF:
from borb.pdf import Document
from borb.pdf import PDF
import typing
doc: typing.Optional[Document] = None
with open("input.pdf", "rb") as fh:
doc = PDF.loads(fh)
# The next line adds an embedded file to the PDF.
# In order to keep this example short, I've used an inline byte string
# but you can of course read a file, and use those bytes
doc.add_embedded_file("name.json", b"{}")
# store
with open("output.pdf", "wb") as fh:
PDF.dumps(fh, doc)