改进我的Perl算法以合并postscript show命令



Matlab R2007b的Postscript输出有问题。我发现文本字符串在postscript输出(simprintdiag)中被拆分为许多"moveto"one_answers"show"命令。这会导致在排版到PDF时出现问题,因为有时会在标签中插入额外的空白(所以你不能双击它们,而且在搜索中找不到!)。

为了避免这个问题,我编写了一个Perl脚本,将这些拆分的"show"命令重新连接在一起,但是,它有一些问题,我需要一些帮助。

  1. 显示类似"(0)s"的命令没有正确重复,并出现在下一个块中
  2. 输入postscript文件总是由脚本修改,即使不需要任何更改
  3. 有一个黑客在开始时绕过连续的显示命令
  4. 它不是很快,而且考虑到一些项目有超过2000个postscript文件,任何速度改进都是受欢迎的

下面代码中的DATA有四个mt和s命令中拆分文本字符串的示例。我已经包括了最后的输出应该是什么样子。脚本使用了这样一个事实,即我们的文本是从左到右写的,或者在后记中,使用移动的X线和固定的Y线。因此,得出结论,具有相同Y线的连续mt命令是相同的文本字符串。

感谢您的帮助。

谢谢:)

我的Perl脚本:

use strict;
use warnings;
my $debug=1;
#
## Slurp the input file into a variable
my $ps_in;
while(<DATA>) {
$ps_in .= $_;     # Take a copy of input file
}

#
## HACK
## The main PS fix algorithm only works with show commands on a single
## line!  Fix the input contents now by joining all show commands that 
## occur over multiple lines.  Examples of this are:
##  272   63 mt 
## (main is an externally linked function of the ACC feature ru
## nning every ) s
##  991   63 mt
## (100) s
my $buf;
my $no_show_split;
open(my $fh_ps, "<", $ps_in );
while(<$fh_ps>) {
if( /^(.*)\$/ ) {   # Match on all lines ending with backslash 
$buf .= $1;
}
else {
if( $buf ) {
$no_show_split .= $buf;
undef($buf);
}
$no_show_split .= $_;
}
}
close $fh_ps;
#
## Reopen our ps input, now the show splits have been removed
open($fh_ps,"<",$no_show_split );
my $moveto_line = qr/^s*d+s+(d+)s+(mt|moveto)/;  # Example '2831  738 mt'
my $show_line   = qr/^((.+))s+(s|show)/;           # Example '(chris) s'
my $ycrd;      # Y-axis cords
my $pstxt;     # Text to display
my $mtl;       # Moveto line
my $print_text;
my $fixes=0;
my $ps_condensed;
while(<$fh_ps>) {
if( $print_text ) {
$ps_condensed .= "$mtln";
$ps_condensed .= "($pstxt) sn";
print "($pstxt) sn====================n" if $debug;
undef($ycrd);
undef($pstxt);
$print_text=0;
++$fixes;
}
if( /$moveto_line/ ) {
chomp;
if( !$ycrd ) {
$mtl=$_;       # Store this line for print later
$ycrd=$1;      # Match on y-axis value
redo;          # Redo this iteration so we can read the show line in
}
elsif( $1 == $ycrd ) {
<$fh_ps> =~ /$show_line/;  # Read in the show line
$pstxt .= $1;              # Built up string we want
print " $mtl -->$1<--n" if $debug;
}
else {
$print_text=1; # Dropped out matching on y-cord so force a print
redo;          # Need to redo this line again
}
}
else {
if( $pstxt ) {     # Print if we have something in buffer
$print_text=1;
redo;
}
$ps_condensed .= $_;
}
} # End While Loop
close $fh_ps;
print $ps_condensed;

__DATA__
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 60 FMSR
11214 11653 mt 
(0) s
4.5 w
156 0 2204 19229 2 MP stroke
156 0 2204 19084 2 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
8913 14971 mt 
(Function) s
9405 14971 mt 
(-) s
9441 14971 mt 
(Call) s
9009 15127 mt 
(Generator) s
6 w

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
4962 4747 mt 
(trigger) s
5322 4747 mt 
(_) s
5394 4747 mt 
(scheduler) s
5934 4747 mt 
(_) s
6006 4747 mt 
(100) s
6222 4747 mt 
(ms) s
6378 4747 mt 
(_) s
6450 4747 mt 
(task) s
6654 4747 mt 
(_) s
6726 4747 mt 
(06) s
6 w
gr
24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
5454 11947 mt 
(Chris_
did_this_example_) s
5874 11947 mt 
(to_test) s
5946 11947 mt 
(_out) s
6 w

最后的"浓缩"后记应该是什么样子:

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 60 FMSR
11214 11653 mt 
(0) s
4.5 w
156 0 2204 19229 2 MP stroke
156 0 2204 19084 2 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
8913 14971 mt 
(Function-Call) s
9009 15127 mt 
(Generator) s
6 w

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
4962 4747 mt 
(trigger_scheduler_100ms_task_06) s
6 w
gr
24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
5454 11947 mt 
(Chris_did_this_example_to_test_out) s
6 w

我认为以下内容将适用于您。

注:

  • 在所有数据中使用成语:do { local $/; <DATA> }
  • 使用单个正则表达式修复行末尾的反斜杠

use strict;
use warnings;
my $data = do { local $/; <DATA> };
$data =~ s,\n,,g;
my $out = "";
my $s = "";    
my $y;
for my $line (split("n", $data)) {
if (defined($y) && $line =~ m/^((.*))s+ss*$/) {
$s .= $1;
next;
} elsif ($line =~ m/^(d+)s+(d+)s+mts*$/) {
if (defined($y) && $y == $2) {
next;
} else {
$y = $2;
}
} else {
$y = undef;
}
if (length($s)) {
$out .= "($s) sn";
$s = "";
}
$out .= "$linen";
}
print $out;

我没有看到这方面的通用方法。但一系列特殊案例似乎奏效了。这里的弱点是,添加越来越多的特殊情况并不是一个可以很好扩展的模型。但是,如果这是一个完整的问题清单,那么这应该是可行的。

#!/usr/bin/perl -Tw
use strict;
use warnings;
my %regex_for = (
a => qr{
( ( w+ ) )     s s  s+  # (Function) s
d+ s+ d+       s mt s+  # 9405 14971 mt
( ( [-_]|ms ) ) s s  s+  # (-) s
d+ s+ d+       s mt s+  # 9441 14971 mt
( ( w+ ) )     s s  s+  # (Call) s
}xmsi,
b => qr{
( ( w+ ) \ s* ( w+ ) )  # (Chris_
}xms,    #  did_this_example_)
c => qr{
( ( w+ _ ) ) s s  s+  # (Chris_did_this_example_) s
d+ s+ d+     s mt s+  # 5874 11947 mt
( ( w+ ) )   s s  s+  # (to_test) s
}xms,
d => qr{
( ( w+ ) )   s s  s+  # (to_test) s
d+ s+ d+     s mt s+  # 5946 11947 mt
( ( _ w+ ) ) s s  s+  # (_out) s
}xms,
);
my $ps = do { local $/; <DATA> };
REGSUB:
{
my $a = $ps =~ s{ $regex_for{a} }{($1$2$3) sn}xmsg;
my $b = $ps =~ s{ $regex_for{b} }{($1$2)}xmsg;
my $c = $ps =~ s{ $regex_for{c} }{($1$2) sn}xmsg;
my $d = $ps =~ s{ $regex_for{d} }{($1$2) sn}xmsg;
redo REGSUB
if $a || $b || $c || $d;
}
print $ps;
__DATA__
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 60 FMSR
11214 11653 mt
(0) s
4.5 w
156 0 2204 19229 2 MP stroke
156 0 2204 19084 2 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
8913 14971 mt
(Function) s
9405 14971 mt
(-) s
9441 14971 mt
(Call) s
9009 15127 mt
(Generator) s
6 w

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
4962 4747 mt
(trigger) s
5322 4747 mt
(_) s
5394 4747 mt
(scheduler) s
5934 4747 mt
(_) s
6006 4747 mt
(100) s
6222 4747 mt
(ms) s
6378 4747 mt
(_) s
6450 4747 mt
(task) s
6654 4747 mt
(_) s
6726 4747 mt
(06) s
6 w
gr
24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR
5454 11947 mt
(Chris_
did_this_example_) s
5874 11947 mt
(to_test) s
5946 11947 mt
(_out) s
6 w

最新更新