FASTA文件中的模式计数器



我试图在Fasta文件中获取匹配模式的计数。我是从包含57K序列的FASTA文件开始。我想拉出匹配模式序列的计数,并显示模式的起始位置

输入文件:

chr1 ATTAG**CAGAT**GTGACGTCGATGT**CAGAT**TG
chr2 TGAGCTG**CAGAT**CGTAGATGATTCTGCAGGAACCT
chr3 TCTTT**CAGAT**GCCTCTG**CAGAT**TC

搜索模式" cagat"

所需的输出:

chr Count P1 P2

chr1-2-6-25

chr2 -1-8

chr3 - 2-6-19

预先感谢

我假设您的文件与选项分配器分开。

   open(my $in,"<:utf8","in.txt") or die "Cannot open FILE in.txt : $!n";
    while(<$in>) {
        chomp($_);
        my $cur = $_;
        #$print "iam $curn";
        my @tt = split(/t/,$cur);  #assuming you file tobe tab seperated
        my $s1 = $tt[1];
        my $s2 = "CAGAT";
        my @val;
        print "$curt";
        while ($s1 =~ /($s2)/g) {
            push(@val,  $-[0]); #$-[0] is the offset of the start of the last successful match.
        }
        my $count = @val;
        @val = join(",",@val);
        print " No of Matches:$count Starting positions:@valn";
    }

in.txt

chr1    ATTAG**CAGAT**GTGACGTCGATGT**CAGAT**TG
chr2    TGAGCTG**CAGAT**CGTAGATGATTCTGCAGGAACCT
chr3    TCTTT**CAGAT**GCCTCTG**CAGAT**TC

最新更新