Python pyPFD2 PDF裁剪页面并合并为单个页面



尝试裁剪几个PDF页面并将它们合并为单个角色样式页面。我需要删除每个页面的石南和页脚,并创建一个角色风格的单页

import PyPDF2
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.pdf import PageObject
reader = PdfFileReader('/Users/kic/Desktop/test.pdf','r')
writer = PdfFileWriter()
### find Total Height of file ###
numpages = reader.getNumPages() ## get number of pages
Height = reader.getPage(0).mediaBox.getHeight() ## get height of title page
Height = Height + 482 * (reader.getNumPages()-2) ## add number of height crop pages
### create new single role page ###
Single_page = PageObject.createBlankPage(None, reader.getPage(0).mediaBox.getWidth(),    Height)
### add first title page without croping ###
Single_page.mergeTranslatedPage(reader.getPage(0),0,Height-reader.getPage(0).mediaBox.getHeight(),False)
### loop through all pages from page 2 until last page ###
n=1
for i in range(reader.getNumPages()-1):

i=n
page = reader.getPage(i)
page.cropBox.setUpperLeft((0,556))
page.cropBox.setUpperRight((page.mediaBox.getWidth(),556))
page.cropBox.setLowerLeft((0,74))
page.cropBox.setLowerRight((page.mediaBox.getWidth(),74))
Single_page.mergeTranslatedPage(page,0,482*(numpages-1-n),False)
#writer.addPage(page) ##to see the result of the cropped pages without merging
n = n+1
writer.addPage(Single_page)
output = open('/Users/kic/Desktop/testrcrop.pdf','wb')
writer.write(output)
output.close()

由于某些原因,它没有裁剪,它将页面合并为一个页面,但是石南和页脚彼此重叠。但是,如果我不合并成一个页面,只是将裁切的页面写入PDF文件,其中有几个页面,它们显示为裁切。

您应该翻译裁剪区域,并为其媒体框设置新边界,以确定它们在新页面中所占的位置。
相对于当前位置进行平移,并按绝对值进行裁剪。

根据mergeTranslatedPage()文档,它已经被弃用了,你应该使用add_transformation()merge_page()

y_translation = total_height - upper_bound行计算考虑到与新页面上插入的前一页相关的裁剪区域的边界,您需要翻译多少页。

例如:
你有4个页面,高度为800,从第2页到最后,它们将裁剪为600和200,你的新页面的高度应该是2000。
如果你的第一页顶部是800,你必须翻译它到1200,并裁剪2000和1200。第二页的裁剪面积顶部是600,必须是1200(2000 - 800),所以你需要翻译它600和裁剪1200和800。

  1. T=1200, U=2000, B=1200
  2. T=600, U=1200, B=800
  3. T=200, U=800, B=400
  4. T=-200, U=400, B=0
from PyPDF2 import PageObject, PdfFileReader, PdfFileWriter, Transformation
# Define the crop bounds for pages other than the first page
CROP_Y_TOP = 556
CROP_Y_HEIGHT = 482
# Set the input and output file paths
input_path = r"input.pdf"
output_path = r"output.pdf"
# Open the input and output files in binary read and write mode
with open(input_path, "rb") as input_file, open(output_path, "wb") as output_file:
# Create a PdfFileReader object for the input file
reader = PdfFileReader(input_file)
# Create a PdfFileWriter object for the output file
writer = PdfFileWriter()
# Calculate the total height of the output page
total_height = reader.getPage(0).mediabox.height + (CROP_Y_HEIGHT * (reader.getNumPages() - 1))
# Create a blank page with the calculated total height
single_page = PageObject.create_blank_page(
pdf=None,
width=reader.getPage(0).mediabox.width,
height=total_height
)
# Loop through all pages of the input document
for i in range(reader.getNumPages()):
# Get the current page
page = reader.getPage(i)
original_mediabox = reader.getPage(i).mediaBox
# Determine the upper and lower bounds for the crop
upper_bound = original_mediabox.height if i == 0 else CROP_Y_TOP
lower_bound = 0 if i == 0 else CROP_Y_TOP - CROP_Y_HEIGHT
# Calculate the translation to apply to the page
y_translation = total_height - upper_bound
# Create a transformation object with the calculated translation
transformation = Transformation().translate(ty=y_translation)
# Apply the transformation to the page
page.add_transformation(transformation)
# Update the page media box with the new bounds
page.mediabox.lower_left = (0, lower_bound + y_translation)
page.mediabox.upper_right = (original_mediabox.width, upper_bound + y_translation)
print(f"T={y_translation}tU={upper_bound + y_translation}tL={lower_bound + y_translation}")
# Merge the transformed page onto the output page
single_page.merge_page(page)
# Decrease the total height by the height of the current page
total_height -= upper_bound
# Add the output page to the writer
writer.addPage(single_page)
# Write the output file
writer.write(output_file)

最新更新