是否可以使用PDF::API2拆分基于书签的多文档PDF ?例如,如果myfile.pdf包含以下书签:
- bookmark1
- bookmark2
- bookmark3
然后需要拆分为以下单独的PDF文件:
- bookmark1.pdf
- bookmark2.pdf
- bookmark3.pdf
我在PDF::API2的文档中找不到书签术语。是指outline吗?
谢谢!
我在Perl中尝试了一下,然后放弃了,把辛苦的工作交给了pdftk。但我仍然从Perl中控制它。这里有一个示例脚本,其中我的书签标题为"第1章"one_answers"附录1"。您可能可以改编这个脚本,但要意识到其中一些内容是我所特有的。我还使用了一些新特性,但是如果您不想使用Perl 5.13,您可以轻松地将这些部分切换掉:
use 5.013;
use Data::Dumper;
use File::Basename;
use File::Spec::Functions;
use File::Path qw(make_path);
my $pdftk = 'pdftk';
my $file = $ARGV[0];
say ("n$0 <FILENAME>") && exit 1 unless $file;
my $dir = dirname( $file ) || '.';
my $output_dir = $ARGV[1] || $dir;
unless( -e $output_dir ) {
make_path $output_dir, { mode => 0755 } unless -e $output_dir;
die "mkdir failed: $!" unless -e $output_dir;
}
my $string = `$pdftk @{[quotemeta($file)]} dump_data output -`;
my( $last_page ) = $string =~ m/NumberOfPages: (d+)/;
say "last page is $last_page";
my $regex = qr/
BookmarkTitle: s+ (?<title>.*?) s+
BookmarkLevel: s+ (?<level>d+) s+
BookmarkPageNumber: s+ (?<page>d+)
/x;
my @page_numbers;
while( $string =~ /$regex/g ) {
next unless $+{level} == 1;
push @page_numbers, [ @+{ qw(title page) } ];
}
say "Last index is $#page_numbers";
# Chapter 1. Introduction
while( my( $index, $elem ) = each @page_numbers ) {
last if $index == $#page_numbers;
$page_numbers[$index]->[0] =~ s/ / /g;
unshift @$elem,
$page_numbers[$index]->[0] =~ s/(?:Chapter|Appendix)s+(d+|[ABC]|).?s+//g
?
$1
:
'XX';
last if $index == $#page_numbers;
push @$elem, $page_numbers[$index+1]->[-1] - 1;
}
unshift @{ $page_numbers[-1] }, 'XX';
push @{ $page_numbers[-1] }, $last_page;
print Dumper( @page_numbers );
# pdftk A=one.pdf B=two.pdf cat A1-7 B1-5 A8 output combined.pdf
foreach my $elem ( @page_numbers ) {
my $chapter = $elem->[1] =~ s/s+/_/rg;
my $filename = catfile( $output_dir, "$elem->[0].$chapter.pdf" );
say "Splitting Chapter $elem->[0] $elem->[1]";
print "Running ", join ' ', $pdftk, $file, 'cat', "$elem->[2]-$elem->[3]", 'output', $filename, "n";
system $pdftk, $file, 'cat', "$elem->[2]-$elem->[3]", 'output', $filename;
}