如何从文本文件中读取表格数据 - Perl.



我们有文本文件,其中包含正常和表格形式的数据。 我可以读取正常数据,但我无法读取表格形式的数据。

任何人都可以帮我阅读并提取表格数据。

文本文件数据 :

    225 Top Hitters
    RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w
    --------  ---------  --------  ---------  ---------  ---------  ---------  -----------  -----------  -----------  -----------
     11078.9      141.3    3754.8        418       7325          0          0            0            4            0            4

Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 7333

我的代码:

    use strict;
    use warnings;
    my ($RT,$BRT,$TL ,$l_mig_a,$l_mig_w,$b_mig_a,$b_mig_w,$l_b_mig_a,$l_b_mig_w,$b_l_mig_a,$b_l_mig_w);
    open (FH, "<" ,"file.txt") or print "could not open $!";
    my @lines = <FH>;
    close FH;
    foreach my $line (@lines) {
        print "$line n";
    }

预期输出 :

$RT = 11078.9
$BRT = 141.3
$TL = 3754.8
$l_mig_a = 418
$l_mig_w = 7325
$b_mig_a = 0
$b_mig_w = 0
$l_b_mig_a = 0
$l_b_mig_w = 4
$b_l_mig_a = 0
$b_l_mig_w = 4

在预期的输出中,在每个标头名称之前包含一个$我希望您的意图不是eval结果并以编程方式使用这些值,因为有更好的方法可以做到这一点(例如,哈希)。如果这是您的计划,那么您在行尾也缺少分号。

由于我无法从您的问题中推断出您的用例,因此我决定简单地按原样转储键和值;随意添加您想要的任何修饰。

use strict;
use warnings;
my @keys;
my @values;
while (<DATA>) {
    if ($. == 2) {
        @keys = split;
        for (@keys) {
            s/W.+$//;
        }
    } elsif ($. == 4) {
        @values = split;
        last;
    }
}
for my $i (0 .. $#keys) {
    print "$keys[$i] = $values[$i]n";
}

__DATA__
225 Top Hitters
RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w
--------  ---------  --------  ---------  ---------  ---------  ---------  -----------  -----------  -----------  -----------
 11078.9      141.3    3754.8        418       7325          0          0            0            4            0            4

Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 7333

如果您的输入文件实际上只有 10 行长(即,您没有告诉我们额外的 500 万行数据行),您可以简化读取和拆分为几行代码:

my @lines  = <DATA>;
my @keys   = map { s/W.+$//r } split(' ', $lines[1]);
my @values = split(' ', $lines[3]);

输出:

RT = 11078.9
BRT = 141.3
TL = 3754.8
l_mig_a = 418
l_mig_w = 7325
b_mig_a = 0
b_mig_w = 0
l_b_mig_a = 0
l_b_mig_w = 4
b_l_mig_a = 0
b_l_mig_w = 4

若要收集值以供以后在程序中使用,同时保持标头和值之间的关联,请创建哈希:

my %hash;
@hash{@keys} = @values;

哈希将具有以下结构:

{
  b_l_mig_a => 0,
  b_l_mig_w => 4,
  b_mig_a => 0,
  b_mig_w => 0,
  BRT => 141.3,
  l_b_mig_a => 0,
  l_b_mig_w => 4,
  l_mig_a => 418,
  l_mig_w => 7325,
  RT => 11078.9,
  TL => 3754.8,
}

这是 Matt 的另一种策略,它搜索文件中包含一个或多个连字符的第一行,-、可能的空格,没有其他任何内容。然后列标签在上一行,值在下一行

use strict;
use warnings 'all';
use List::Util 'max';
use constant DATA_FILE => 'tabular_data.txt';
# Read the whole file into an array
my @file = do {
    open my $fh, '<', DATA_FILE or die $!;
    <$fh>;
};
chomp @file;
# Find the first line that contains only one or more hyphens
# and possibly some whitespace
my $i = 0;
for ( @file ) {
    last if /-/ and not /[^-s]/;
    ++$i;
}
die "Header line not found" unless $i < @file;
# Build the key array from the preceding line, and the
# values array from the succeeding line
my @keys = split ' ', $file[$i-1];
s/(.*// for @keys;
my @values = split ' ', $file[$i+1];
my %data;
@data{@keys} = @values;
# Display what we've recovered
my $w = max map length, @keys;
for my $key ( @keys ) {
    printf "%-*s => %sn", $w, $key, $data{$key};
}

输出

RT        => 11078.9
BRT       => 141.3
TL        => 3754.8
l_mig_a   => 418
l_mig_w   => 7325
b_mig_a   => 0
b_mig_w   => 0
l_b_mig_a => 0
l_b_mig_w => 4
b_l_mig_a => 0
b_l_mig_w => 4

您可以将整个文件"slurp"为单个字符串变量,并使用正则表达式来解析表格数据。请在下面找到带有子例程的示例脚本,以简化正则表达式的生成。

请在下面找到示例实现,其中测试数据与代码捆绑到单个脚本/文件中。

use strict;
use warnings;
my $text;
{
  # put all lines into single string
  local $/ = undef;
  $text = <DATA>;
}
my $regex = &make_regex(qw{RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w});
print "REGEX-STARTn$regexnREGEX-ENDn"; # Debuging: Show generated regular expression
my ($RT,$BRT,$TL ,$l_mig_a,$l_mig_w,$b_mig_a,$b_mig_w,$l_b_mig_a,$l_b_mig_w,$b_l_mig_a,$b_l_mig_w) 
   = $text =~ /$regex/ or die;
print "b_l_mig_w = $b_l_mig_wn";
sub make_regex {
  my $n = scalar(@_);
  my $str = '
s*' . join('s+',map {quotemeta($_)} @_) . 's*
s*' . join('s+',('-+')    x $n) . 's*
s*' . join('s+',('(S+)') x $n) . 's*
';
  qr{$str}m;
} # end sub make_regex
__DATA__
    225 Top Hitters
    RT(ms)    BRT(ms)    TL(ms)    l_mig_a    l_mig_w    b_mig_a    b_mig_w    l_b_mig_a    l_b_mig_w    b_l_mig_a    b_l_mig_w
    --------  ---------  --------  ---------  ---------  ---------  ---------  -----------  -----------  -----------  -----------
     11078.9      141.3    3754.8        418       7325          0          0            0            4            0            4

Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 733

最新更新