我们有文本文件,其中包含正常和表格形式的数据。 我可以读取正常数据,但我无法读取表格形式的数据。
任何人都可以帮我阅读并提取表格数据。
文本文件数据 :
225 Top Hitters
RT(ms) BRT(ms) TL(ms) l_mig_a l_mig_w b_mig_a b_mig_w l_b_mig_a l_b_mig_w b_l_mig_a b_l_mig_w
-------- --------- -------- --------- --------- --------- --------- ----------- ----------- ----------- -----------
11078.9 141.3 3754.8 418 7325 0 0 0 4 0 4
Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 7333
我的代码:
use strict;
use warnings;
my ($RT,$BRT,$TL ,$l_mig_a,$l_mig_w,$b_mig_a,$b_mig_w,$l_b_mig_a,$l_b_mig_w,$b_l_mig_a,$b_l_mig_w);
open (FH, "<" ,"file.txt") or print "could not open $!";
my @lines = <FH>;
close FH;
foreach my $line (@lines) {
print "$line n";
}
预期输出 :
$RT = 11078.9
$BRT = 141.3
$TL = 3754.8
$l_mig_a = 418
$l_mig_w = 7325
$b_mig_a = 0
$b_mig_w = 0
$l_b_mig_a = 0
$l_b_mig_w = 4
$b_l_mig_a = 0
$b_l_mig_w = 4
在预期的输出中,在每个标头名称之前包含一个$
。我希望您的意图不是eval
结果并以编程方式使用这些值,因为有更好的方法可以做到这一点(例如,哈希)。如果这是您的计划,那么您在行尾也缺少分号。
由于我无法从您的问题中推断出您的用例,因此我决定简单地按原样转储键和值;随意添加您想要的任何修饰。
use strict;
use warnings;
my @keys;
my @values;
while (<DATA>) {
if ($. == 2) {
@keys = split;
for (@keys) {
s/W.+$//;
}
} elsif ($. == 4) {
@values = split;
last;
}
}
for my $i (0 .. $#keys) {
print "$keys[$i] = $values[$i]n";
}
__DATA__
225 Top Hitters
RT(ms) BRT(ms) TL(ms) l_mig_a l_mig_w b_mig_a b_mig_w l_b_mig_a l_b_mig_w b_l_mig_a b_l_mig_w
-------- --------- -------- --------- --------- --------- --------- ----------- ----------- ----------- -----------
11078.9 141.3 3754.8 418 7325 0 0 0 4 0 4
Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 7333
如果您的输入文件实际上只有 10 行长(即,您没有告诉我们额外的 500 万行数据行),您可以简化读取和拆分为几行代码:
my @lines = <DATA>;
my @keys = map { s/W.+$//r } split(' ', $lines[1]);
my @values = split(' ', $lines[3]);
输出:
RT = 11078.9
BRT = 141.3
TL = 3754.8
l_mig_a = 418
l_mig_w = 7325
b_mig_a = 0
b_mig_w = 0
l_b_mig_a = 0
l_b_mig_w = 4
b_l_mig_a = 0
b_l_mig_w = 4
若要收集值以供以后在程序中使用,同时保持标头和值之间的关联,请创建哈希:
my %hash;
@hash{@keys} = @values;
哈希将具有以下结构:
{
b_l_mig_a => 0,
b_l_mig_w => 4,
b_mig_a => 0,
b_mig_w => 0,
BRT => 141.3,
l_b_mig_a => 0,
l_b_mig_w => 4,
l_mig_a => 418,
l_mig_w => 7325,
RT => 11078.9,
TL => 3754.8,
}
这是 Matt 的另一种策略,它搜索文件中包含一个或多个连字符的第一行,-
、可能的空格,没有其他任何内容。然后列标签在上一行,值在下一行
use strict;
use warnings 'all';
use List::Util 'max';
use constant DATA_FILE => 'tabular_data.txt';
# Read the whole file into an array
my @file = do {
open my $fh, '<', DATA_FILE or die $!;
<$fh>;
};
chomp @file;
# Find the first line that contains only one or more hyphens
# and possibly some whitespace
my $i = 0;
for ( @file ) {
last if /-/ and not /[^-s]/;
++$i;
}
die "Header line not found" unless $i < @file;
# Build the key array from the preceding line, and the
# values array from the succeeding line
my @keys = split ' ', $file[$i-1];
s/(.*// for @keys;
my @values = split ' ', $file[$i+1];
my %data;
@data{@keys} = @values;
# Display what we've recovered
my $w = max map length, @keys;
for my $key ( @keys ) {
printf "%-*s => %sn", $w, $key, $data{$key};
}
输出
RT => 11078.9
BRT => 141.3
TL => 3754.8
l_mig_a => 418
l_mig_w => 7325
b_mig_a => 0
b_mig_w => 0
l_b_mig_a => 0
l_b_mig_w => 4
b_l_mig_a => 0
b_l_mig_w => 4
您可以将整个文件"slurp"为单个字符串变量,并使用正则表达式来解析表格数据。请在下面找到带有子例程的示例脚本,以简化正则表达式的生成。
请在下面找到示例实现,其中测试数据与代码捆绑到单个脚本/文件中。
use strict;
use warnings;
my $text;
{
# put all lines into single string
local $/ = undef;
$text = <DATA>;
}
my $regex = &make_regex(qw{RT(ms) BRT(ms) TL(ms) l_mig_a l_mig_w b_mig_a b_mig_w l_b_mig_a l_b_mig_w b_l_mig_a b_l_mig_w});
print "REGEX-STARTn$regexnREGEX-ENDn"; # Debuging: Show generated regular expression
my ($RT,$BRT,$TL ,$l_mig_a,$l_mig_w,$b_mig_a,$b_mig_w,$l_b_mig_a,$l_b_mig_w,$b_l_mig_a,$b_l_mig_w)
= $text =~ /$regex/ or die;
print "b_l_mig_w = $b_l_mig_wn";
sub make_regex {
my $n = scalar(@_);
my $str = '
s*' . join('s+',map {quotemeta($_)} @_) . 's*
s*' . join('s+',('-+') x $n) . 's*
s*' . join('s+',('(S+)') x $n) . 's*
';
qr{$str}m;
} # end sub make_regex
__DATA__
225 Top Hitters
RT(ms) BRT(ms) TL(ms) l_mig_a l_mig_w b_mig_a b_mig_w l_b_mig_a l_b_mig_w b_l_mig_a b_l_mig_w
-------- --------- -------- --------- --------- --------- --------- ----------- ----------- ----------- -----------
11078.9 141.3 3754.8 418 7325 0 0 0 4 0 4
Total active inter-cluster migrations: 0
Total wakeup inter-cluster migrations: 8
Total active migrations: 418
Total wakeup migrations: 733