Perl 从哈希中提取多行



我有一个制表符分隔的文本文件,如下所示:

data    S1  S2  S3  S4  S5  S6
data1   0   0   0   0   0   0
data2   0   5   3   5   0.1 0.9
data3   0   3   9   3   0   0.01
data4   0   0   4   4   0   0
data5   2   5   11  7   5   0.2
data6   0   0   0   8.  0   0
data7   0   1   5   2   06  0.04

嗯,文件的结构稍微复杂一点,是一个宏基因组学文件,类似于:

D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__Marine 第二组;D_5__uncultured 古菌 0 0 0 0 0 0 0 0 0.0035 0.00293 0.00834 0

从 D_0__ 到 D_5__ = 第一列(示例中的数据( 每个数字代表每列 (S(

但最后,是相似的!!!

我想要的是使用 %row 哈希以单个@label_match (s3( 提取第一行(数据(,并在单个 txt 文件中打印出来,我的意思是,如果我想要 s3 和 s6,请打印出这样的东西:

S3_file.txt(取每列的名称以打印出文件名(:

s3   data #avoid this line in the print out, just to explain !!!
0    data1
3    data2
9    data3
4    data4
11   data5
0    data6
5    data7

S6_file.txt:

0    data1
0.9  data2
0.01 data3
0    data4
0.2  data5
0    data6
0.04 data7

我有这段代码,我认为在 %row 部分中,我必须创建一个 foreach 循环,以逐个提取每个@label_match,但我不知道如何。 这是我的代码:

#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dumper qw(Dumper);
use Getopt::Long;
use List::Util qw(sum);

my ($infile_taxon, $search_label, $output_file, $help, $help_desc, $options, $options_desc, $keep_file);
GetOptions (
't=s'       =>$infile_taxon,
's=s'       =>$search_label,
'kf'        =>$keep_file,
'o=s'       =>$output_file,
'h'         =>$help,
'op'        =>$options
);
#---------------------------Subrutin to clean the selected Taxon  --------------------
sub Taxon_Clean {
my (@clean_result);
foreach (@_){
chomp;
if ($_ =~ s/D_0__//g | s/;D_d__/t/g | s/;/t/g){
push @clean_result, $_;
}
}
return @clean_result;
}
#------------------------------------------------------ Open Files-------------------
open INFILE_TAXONOMY, '<', "$infile_taxon" or die $!;
my (@taxon, @sample_names);
#------------------------------------------------------ Taxon -----------------------
my ( @header, @label_match, @not_match, @taxon_filter);
while (<INFILE_TAXONOMY>){
chomp;
if ($_=~ m/^$|Constructed from biom file/g)  {
next;
}
elsif ($_=~ s/OTU ID/Taxon/g){
chomp ( @header = split 't', $_ );
#------------------------------------------------------ Search Label ----------------
if ($search_label){
my @label_wanted= split (/,/, $search_label); 
unshift @label_wanted, '#Taxon';
@label_wanted = uniq (@label_wanted);
foreach (@label_wanted){
my $unit =$_;
chomp $unit;
if (my @match_wanted= grep (/$unit/, @header)){
push (@label_match, @match_wanted);
}
else {
push (@not_match, $unit);
}
}
#                                --------- Check Point ---------
push (my @defined_elements, @label_match);
shift @defined_elements;
if (! @defined_elements){
print "ntNON of the Search Samples " $search_label " "
. "Were Found in " $infile_taxon " File !!!nn";
exit;
}
elsif (grep {defined($_)} @defined_elements){  
if (grep {defined($_)} @not_match){
print "ntSamples No Found: @not_matchnn";
}
}
}
}
elsif ($_=~ m/^#/g){
next;
}
elsif ($search_label) {  
my %row;
@row{@header} = split 't'; 
my @filter= join "t", @row{@label_match}, "n";
push (@taxon_filter, @filter);
#print Dumper (%row);
}
else {
push (@taxon, $_); 
}
}

# The Next section is to extract all the wanted columns in a single file,
# but here is where I want to extract one by one column i a separate file !!!

open OUTPUT, '>', "Taxonomic_results_file.txt", or die "can't create the output file";
foreach (@taxon_filter){
chomp $_;
my ($tax, @values) = split 't', $_;
my $unit_val = join("t", map { $_ } @values);
my $sum_elements = sum (@values);
if ($sum_elements == 0){
next;
}
else {
push (my @tx, $tax);
@tx = Taxon_Clean (@tx);
print OUTPUT "$unit_valt@txn";
}
}

close INFILE_TAXONOMY;
close OUTPUT;
exit;

非常感谢

你已经使用@row{@header}类型语法做了很多事情。这需要一个哈希切片,这意味着您可以根据哈希键匹配多个元素。

输出的工作方式大致相同

open ( my $s3_file, '>', 'S3_file.txt' ) or warn $!;
my @output_fields = qw ( s3 data ); #matches column headings

并在%row块内向下:

print {$s3_file} join ("t", @row{@output_fields} )), "n"; 

最新更新