查找 2 个值范围内的值

我有两个制表符分隔的表：

table1
col1    col2    col3    col4
id1     chr1     1       10
id2     chr1     15      20
id3     chr1     30      35

table2
col1    col2    col3
rs1     5       chr1
rs2     11      chr1
rs3     34      chr1
rs4     35      chr1

我想检查 col2-table2 中是否有介于 col3 和 col4 - table1 中的值之间的值。如果是这种情况，我想将 col1 和 col2 的相应值打印到 table1 的新列中。

因此，在此示例中，最终结果文件应如下所示：

 table output
 col1    col2   col3   col4   new_col1    
 id1     chr1    1      10     rs1:5
 id2     chr1    15     20     
 id3     chr1    30     35     rs3:34, rs4:35

我在这里有几个问题：- 我想我应该使用 2 while 循环。- 通常，如果我想存储值，我会使用哈希，然后查看另一个表中是否有与此值匹配的值。但是，在这里我必须存储 2 个值，因为我需要查看 table2 的值是否存在于表1 中的两个值范围内。- 如何在new_col1中存储值

我想到了这样的东西来存储范围（我正在使用perl）：

my @range;
while (<$table1>){
    my @cols = split (/t/);
    $range[$_] .= "$range" for $cols[$2] .. $cols[$3]; #store the ranges
}
chop @range;

但是如何与$table 2进行比较呢？

更新：我不仅想检查 col2-table2 中是否有介于 col3&col4 - table1 中的值之间的值。我还需要检查 col2-table1 和 col3-table3 之间是否存在匹配。如果确实存在匹配项，则可以检查我描述的第一件事（col2-表中的值介于col3&col4 - table1中的值之间）。

这将按照您的要求进行操作。它的工作原理是将所有信息从table2读取到数组数组中@table2。然后table1逐行处理，第五列从到目前为止积累的数据中计算出来，并将结果打印到STDOUT。

use strict;
use warnings;
use 5.010;
use autodie;
my @table2;
open my $fh, '<', 'table2.txt';
while (<$fh>) {
  my @columns = split;
  next if $columns[1] =~ /D/;
  push @table2, @columns;
}
open $fh, '<', 'table1.txt';
while (<$fh>) {
  my @columns = split;
  if ( grep /D/, @columns[2,3] ) {
    push @columns, 'new_col1';
  }
  else {
    my @matches = grep { $_->[1] >= $columns[2] and $_->[1] <= $columns[3]  } @table2;
    push @columns, join(', ', map join(':', @$_), @matches);
  }
  print join("t", @columns), "n";
}

输出

col1  col2  col3  col4  new_col1
id1 ... 1 10  rs1:5
id2 ... 15  20  
id3 ... 30  35  rs3:34, rs4:35

我认为你正在倒退这个问题。首先将table2解析为哈希会使问题变得容易得多。因为这样您就可以迭代table1并检查相关范围内的任何值。

use strict;
use warnings;
use Data::Dumper;
my %table2;
while (<DATA>) {
    #stop reading if we've finished with table2
    last if m/^table1/;
    next unless m/^rs/;
    my ( $col1, $col2 ) = split(/s+/);
    $table2{$col1} = $col2;
}
print Dumper %table2;
while (<DATA>) {
    next unless m/^id/;
    chomp;
    my ( $rowid, $col2, $lower, $upper ) = split(/s+/);
    my $newcol = "";
    foreach my $rs ( keys %table2 ) {
        if (    $table2{$rs} >= $lower
            and $table2{$rs} <= $upper )
        {
            $newcol .= " $rs:$table2{$rs}";
        }
    }
    print join( "t", $rowid, $col2, $lower, $upper, $newcol, ), "n";
}

__DATA__
table2
col1    col2
rs1     5   
rs2     11
rs3     34
rs4     35
table1
col1    col2    col3    col4
id1     ...     1       10
id2     ...     15      20
id3     ...     30      35

输出

$VAR1 = {
          'rs1' => '5',
          'rs2' => '11',
          'rs4' => '35',
          'rs3' => '34'
        };
id1 ... 1 10   rs1:5
id2 ... 15  20  
id3 ... 30  35   rs4:35 rs3:34

相关内容

最新更新

热门标签：