如何在perl编程中非常快速地读取.gz文件中的数据



我正在读取一个大约 3 GB 的.gz文件。我正在使用Perl程序处理一个模式。我能够 grep 模式,但处理时间太长。谁能帮我如何快速处理?

use strict ;
use warnings ;
use Compress::Zlib;
my $file = "test.gz";
my $gz = gzopen ($file, "rb") or die "Error Reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0 ) {
if (/pattern/) {
print "$_----->PASSn";
}
}
die "Error reading $file: $gzerrno" if $gzerrno != Z_STREAM_END;
$gz ->gzclose();

变量Z_STREAM_END有什么作用?

我写了一个脚本,用于计算读取 gz 文件所需的各种方法所需的时间。 我也发现Compress::Zlib很慢。

use strict;
use warnings;
use autodie ':all';
use Compress::Zlib;
use Time::HiRes 'time';
my $file = '/home/con/Documents/snp150.txt.gz';
# time zcat execution
my $start_zcat = Time::HiRes::time();
open my $zcat, "zcat $file |";
while (<$zcat>) {
#      print $_;
}
close $zcat;
my $end_zcat = Time::HiRes::time();
# time Compress::Zlib reading
my $start_zlib = Time::HiRes::time();
my $gz = gzopen($file, 'r') or die "Error reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0) {#http://blog-en.openalfa.com/how-to-read-and-write-compressed-files-in-perl
#       print "$_";# Process the line read in $_
}
$gz->gzclose();
my $end_zlib = Time::HiRes::time();
printf("zlib took %lf seconds.n", $end_zlib - $start_zlib);
printf("zcat took %lf seconds.n", $end_zcat - $start_zcat);

使用此脚本,我发现通读zcat的运行速度比Compress::Zlib快 7 倍(!( 当然,这将因计算机和文件而异。

最新更新