Perl 哈希比较



>我有两个"不同"的文件,具有相同类型的数据,即

KEY_gl Start_gl   End_gl
1   114029  17
2   284 1624
3   1803    2942
4   3070    3282
5   3295    4422 
KEY_gm Start_gm   End_gm
1   115000  17
2   284 1624
3   1803    2942
4   3070    3282
5   3295    4422 

我已经将这两个不同的文件保存在"哈希"中。"键"列是键,开始和结束是这两个不同键的值。

我编写了一个代码来比较这两个哈希值,并从文件中打印出"相似"和"不相似"的键。

该守则是

my %hash_gl = ();
my %hash_gm = ();
open( my $fgl, "/home/gaurav/GMR/new_gl.txt" ) or die "Can't open the file";
while ( my $line_gl = <$fgl> ) {
chomp $line_gl;
my ( $key_gl, $start_gl, $end_gl ) = split( "t", $line_gl );
$hash_gl{$key_gl} = [ $start_gl, $end_gl ];
}
while ( my ( $key_gl, $val_gl ) = each %hash_gl ) {
#print "$key_gl => @{$val_gl}n";
}
open( my $fgm, "/home/gaurav/GMR/new_gm.txt" ) or die "Can't open the file";
while ( my $line_gm = <$fgm> ) {
chomp $line_gm;
my ( $key_gm, $start_gm, $end_gm ) = split( "t", $line_gm );
$hash_gm{$key_gm} = [ $start_gm, $end_gm ];
}
while ( my ( $key_gm, $val_gm ) = each %hash_gm ) {
#print "$key_gm => @{$val_gm}n";
}
for ( sort keys %hash_gl ) {
unless ( exists $hash_gm{$_} ) {
print "$_: not found in second hashn";
next;
}
if ( $hash_gm{$_} == $hash_gl{$_} ) {
print "$_: values are equaln";
}  else {
print "$_: values are not equaln";
}
}

请告诉其中的错误,因为我没有得到所需的输出。另外,很抱歉我是这个论坛的新手,所以我无法正确进行编辑。

读取文件后,两个哈希如下所示。我使用 Data::D ump 的函数dd创建了输出。

my %hash_gl = (
1      => [ 114029,     17 ],
2      => [ 284,        1624 ],
3      => [ 1803,       2942 ],
4      => [ 3070,       3282 ],
5      => [ 3295,       442 ],
KEY_gl => [ "Start_gl", "End_gl" ],
);
my %hash_gm = (
1      => [ 115000,     17 ],
2      => [ 284,        1624 ],
3      => [ 1803,       2942 ],
4      => [ 3070,       3282 ],
5      => [ 3295,       4422 ],
KEY_gm => [ "Start_gm", "End_gm" ],
);

如您所见,这些值是数组引用。当你说$hash_gl{$key_gl} == [ $start_gl, $end_gl ];时,你把它们放在数组引用中(gm也是如此)。

当您比较两者时,您使用的是==,它用于数值比较。如果打印$hash_gm{$_}值之一,将得到如下所示的内容:

ARRAY(0x3bb114)

那是因为它是一个数组引用。您无法比较使用==的那些。

您现在有两种可能性:

  • 你可以自己做比较;要做到这一点,你需要进入数组引用并比较每个值:

    if ( $hash_gm{$_}->[0] == $hash_gl{$_}->[0]
    && $hash_gm{$_}->[1] == $hash_gl{$_}->[1] )
    {
    print "$_: values are equaln";
    }  else {
    print "$_: values are not equaln";
    }
    
  • 您可以安装和使用 Array::Utils

    use Array::Utils 'array_diff';
    # later...
    if (! array_diff( @{ $hash_gm{$_} }, @{ $hash_gl{$_} } )) {
    print "$_: values are equaln";
    }  else {
    print "$_: values are not equaln";
    }
    

我会选择第一个解决方案,因为它更具可读性,因为您不需要取消引用,并且仅仅为了节省半行代码而安装模块的努力是不值得的。

假设你想比较这些值,比如起始位置,这是我的做法:

use warnings;
use strict; 
open my $in, '<', '1.txt' or die "$!n";
open my $in2, '<', '2.txt' or die "$!n";
my (%hash1, %hash2);
while (<$in>){
chomp;
next unless /^s+/;
my ($key, $start, $stop) = /s+(d+)s+(d+)s+(d+)/;
$hash1{$key} = [$start, $stop];
}
while (<$in2>){
chomp;
next unless /^s+/;
my ($key, $start, $stop) = /s+(d+)s+(d+)s+(d+)/;
$hash2{$key} = [$start, $stop];
}
for my $key (sort keys %hash1){
if (@{$hash1{$key}}[0] == @{$hash2{$key}}[0]){
print "start matches: file1 @{$hash1{$key}}[0]tfile2 @{$hash2{$key}}[0]n";
}
else {print "start doesn't match: file1 @{$hash1{$key}}[0]t file2 @{$hash2{$key}}[0]n"};
}
#!/usr/bin/perl
use warnings;
use strict; 
use feature 'say';
my %hash_gl = (
1      => [ 114029,     17 ],
2      => [ 284,        1624 ],
3      => [ 1803,       2942 ],
4      => [ 3070,       3282 ],
5      => [ 3295,       442 ],
);
my %hash_gm = (
1      => [ 115000,     17 ],
2      => [ 284,        1624 ],
3      => [ 1803,       2942 ],
4      => [ 3070,       3282 ],
5      => [ 3295,       4422 ],
);

sub check_hash_size {
my $hash_gl = shift;
my $hash_gm = shift;
if ((keys %$hash_gl) !=  (keys %$hash_gm)) {
say "the hashes are 2 different sizes";
}
else
{
say "the hashes are the same size";  
}
}
sub diag_hashes {
my $hash_gl = shift;
my $hash_gm = shift;
for my $gl_key ( keys %$hash_gl ) {
if (   (scalar @{$$hash_gl{$gl_key}}) !=  (scalar @{$$hash_gm{$gl_key}})  ) {
say  "$gl_key entry arrays are different sizes";
}
else
{
say "arrays are the same size for key $gl_key";
}           

if (   ((scalar @{$$hash_gl{$gl_key}}) or (scalar @{$$hash_gm{$gl_key}})) > 2 ) {
say  "$gl_key entry array exceeds 2 values";
}
if  ($$hash_gl{$gl_key}[0] eq $$hash_gm{$gl_key}[0]) { 
say  "$gl_key start is the same in both hashes";  
}
else
{
say  "** key $gl_key start is different";
}      
if  ($$hash_gl{$gl_key}[1] eq $$hash_gm{$gl_key}[1]) { 
print "$gl_key end is the same in both hashes","n";  
}
else
{
say  "** key $gl_key end is different";     
}  
}

}

check_hash_size( %hash_gl ,%hash_gm); 
diag_hashes( %hash_gl ,%hash_gm);

最新更新