我试图在Fasta文件中获取匹配模式的计数。我是从包含57K序列的FASTA文件开始。我想拉出匹配模式序列的计数,并显示模式的起始位置
输入文件:
chr1 ATTAG**CAGAT**GTGACGTCGATGT**CAGAT**TG
chr2 TGAGCTG**CAGAT**CGTAGATGATTCTGCAGGAACCT
chr3 TCTTT**CAGAT**GCCTCTG**CAGAT**TC
搜索模式" cagat"
所需的输出:
chr Count P1 P2
chr1-2-6-25
chr2 -1-8
chr3 - 2-6-19
预先感谢
我假设您的文件与选项分配器分开。
open(my $in,"<:utf8","in.txt") or die "Cannot open FILE in.txt : $!n";
while(<$in>) {
chomp($_);
my $cur = $_;
#$print "iam $curn";
my @tt = split(/t/,$cur); #assuming you file tobe tab seperated
my $s1 = $tt[1];
my $s2 = "CAGAT";
my @val;
print "$curt";
while ($s1 =~ /($s2)/g) {
push(@val, $-[0]); #$-[0] is the offset of the start of the last successful match.
}
my $count = @val;
@val = join(",",@val);
print " No of Matches:$count Starting positions:@valn";
}
in.txt
chr1 ATTAG**CAGAT**GTGACGTCGATGT**CAGAT**TG
chr2 TGAGCTG**CAGAT**CGTAGATGATTCTGCAGGAACCT
chr3 TCTTT**CAGAT**GCCTCTG**CAGAT**TC