如何使用Win32 :: Ole Perl软件包导航表表



我有一个带有数百个Word Doc的目录,每个文档包含一组标准化的表。我需要解析这些表并提取它们中的数据。我开发了吐出整个表格的脚本。

#!/usr/bin/perl;
use strict;
use warnings;
use Carp qw( croak );
use Cwd qw( abs_path );
use Path::Class;
use Win32::OLE qw(in);
use Win32::OLE::Const 'Microsoft Word';
$Win32::OLE::Warn = 3;
=d
my $datasheet_dir = "./path/to/worddocs";
my @files = glob "$datasheet_dir/*.doc";
print "scalar: ".scalar(@files)."n";
foreach my $f (@files){
    print $f."n";
}
=cut
#my $file = $files[0];
my $file = "word.doc";
print "file: $filen";
run(@files);
sub run {
    my $argv = shift;
    my $word = get_word();
    $word->{DisplayAlerts} = wdAlertsNone;
    $word->{Visible}       = 1;
    for my $word_file ( @$argv ) {
        print_tables($word, $word_file);
    }
    return;
}
sub print_tables {
    my $word = shift;
    my $word_file = file(abs_path(shift));
    my $doc = $word->{Documents}->Open("$word_file");
    my $tables = $word->ActiveDocument->{Tables};
    for my $table (in $tables) {
        my $text = $table->ConvertToText(wdSeparateByTabs)->Text;
        $text =~ s/r/n/g;
        print $text, "n";
    }
    $doc->Close(0);
    return;
}
sub get_word {
    my $word;
    eval { $word = Win32::OLE->GetActiveObject('Word.Application'); 1 }
        or die "$@n";
    $word and return $word;
    $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit })
        or die "Oops, cannot start Word: ", Win32::OLE->LastError, "n";
    return $word;
}

有没有办法浏览单元格?我只想返回在第一列中具有特定值的行?

例如,对于下表,我只想抓住第一列中有水果的行。

apple       pl
banana      xml
California  csv
pickle      txt
Illinois    gov
pear        doc

您可以使用OLE访问表的各个单元格,在使用列对象和行集合获得尺寸后。

,或者您可以将文本后处理到Perl数组中,然后对其进行迭代。而不是

my $text = $table->ConvertToText(wdSeparateByTabs)->Text;
$text =~ s/r/n/g;
print $text, "n";

之类的东西
my %fruit; # population of look-up table of fruit omitted
my $text = $table->ConvertToText(wdSeparateByTabs)->Text;
my @lines = split /r/, $text;
for my $line ( @lines ) {
    my @fields = split /t/, $lines;
    next unless exists $fruit{$fields[0]};
    print "$linen";
}

可以根据需要添加案例灵敏度等的改进。

最新更新