我有一个制表符分隔的文本文件,如下所示:
data S1 S2 S3 S4 S5 S6
data1 0 0 0 0 0 0
data2 0 5 3 5 0.1 0.9
data3 0 3 9 3 0 0.01
data4 0 0 4 4 0 0
data5 2 5 11 7 5 0.2
data6 0 0 0 8. 0 0
data7 0 1 5 2 06 0.04
嗯,文件的结构稍微复杂一点,是一个宏基因组学文件,类似于:
D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__Marine 第二组;D_5__uncultured 古菌 0 0 0 0 0 0 0 0 0.0035 0.00293 0.00834 0
从 D_0__ 到 D_5__ = 第一列(示例中的数据( 每个数字代表每列 (S(
但最后,是相似的!!!
我想要的是使用 %row 哈希以单个@label_match (s3( 提取第一行(数据(,并在单个 txt 文件中打印出来,我的意思是,如果我想要 s3 和 s6,请打印出这样的东西:
S3_file.txt(取每列的名称以打印出文件名(:
s3 data #avoid this line in the print out, just to explain !!!
0 data1
3 data2
9 data3
4 data4
11 data5
0 data6
5 data7
和
S6_file.txt:
0 data1
0.9 data2
0.01 data3
0 data4
0.2 data5
0 data6
0.04 data7
我有这段代码,我认为在 %row 部分中,我必须创建一个 foreach 循环,以逐个提取每个@label_match,但我不知道如何。 这是我的代码:
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dumper qw(Dumper);
use Getopt::Long;
use List::Util qw(sum);
my ($infile_taxon, $search_label, $output_file, $help, $help_desc, $options, $options_desc, $keep_file);
GetOptions (
't=s' =>$infile_taxon,
's=s' =>$search_label,
'kf' =>$keep_file,
'o=s' =>$output_file,
'h' =>$help,
'op' =>$options
);
#---------------------------Subrutin to clean the selected Taxon --------------------
sub Taxon_Clean {
my (@clean_result);
foreach (@_){
chomp;
if ($_ =~ s/D_0__//g | s/;D_d__/t/g | s/;/t/g){
push @clean_result, $_;
}
}
return @clean_result;
}
#------------------------------------------------------ Open Files-------------------
open INFILE_TAXONOMY, '<', "$infile_taxon" or die $!;
my (@taxon, @sample_names);
#------------------------------------------------------ Taxon -----------------------
my ( @header, @label_match, @not_match, @taxon_filter);
while (<INFILE_TAXONOMY>){
chomp;
if ($_=~ m/^$|Constructed from biom file/g) {
next;
}
elsif ($_=~ s/OTU ID/Taxon/g){
chomp ( @header = split 't', $_ );
#------------------------------------------------------ Search Label ----------------
if ($search_label){
my @label_wanted= split (/,/, $search_label);
unshift @label_wanted, '#Taxon';
@label_wanted = uniq (@label_wanted);
foreach (@label_wanted){
my $unit =$_;
chomp $unit;
if (my @match_wanted= grep (/$unit/, @header)){
push (@label_match, @match_wanted);
}
else {
push (@not_match, $unit);
}
}
# --------- Check Point ---------
push (my @defined_elements, @label_match);
shift @defined_elements;
if (! @defined_elements){
print "ntNON of the Search Samples " $search_label " "
. "Were Found in " $infile_taxon " File !!!nn";
exit;
}
elsif (grep {defined($_)} @defined_elements){
if (grep {defined($_)} @not_match){
print "ntSamples No Found: @not_matchnn";
}
}
}
}
elsif ($_=~ m/^#/g){
next;
}
elsif ($search_label) {
my %row;
@row{@header} = split 't';
my @filter= join "t", @row{@label_match}, "n";
push (@taxon_filter, @filter);
#print Dumper (%row);
}
else {
push (@taxon, $_);
}
}
# The Next section is to extract all the wanted columns in a single file,
# but here is where I want to extract one by one column i a separate file !!!
open OUTPUT, '>', "Taxonomic_results_file.txt", or die "can't create the output file";
foreach (@taxon_filter){
chomp $_;
my ($tax, @values) = split 't', $_;
my $unit_val = join("t", map { $_ } @values);
my $sum_elements = sum (@values);
if ($sum_elements == 0){
next;
}
else {
push (my @tx, $tax);
@tx = Taxon_Clean (@tx);
print OUTPUT "$unit_valt@txn";
}
}
close INFILE_TAXONOMY;
close OUTPUT;
exit;
非常感谢
你已经使用@row{@header}
类型语法做了很多事情。这需要一个哈希切片,这意味着您可以根据哈希键匹配多个元素。
输出的工作方式大致相同
open ( my $s3_file, '>', 'S3_file.txt' ) or warn $!;
my @output_fields = qw ( s3 data ); #matches column headings
并在%row
块内向下:
print {$s3_file} join ("t", @row{@output_fields} )), "n";