罗莎琳德共识和简介



我正试图从这里解决问题http://rosalind.info/problems/cons/

我的脚本填充计数器列表并输出相同长度的一致字符串。我不认为有数学或索引上的错误,也没有遇到瓶颈。我的代码:

 with open('C:/users/steph/downloads/rosalind_cons (3).txt') as f:
    seqs = f.read().splitlines()
#remove all objects that are not sequences of interest
for s in seqs:
    if s[0] == '>':
        seqs.remove(s)
n = range(len(seqs[0])+1)
#lists to store counts for each nucleotide
A, C, G, T = [0 for i in n], [0 for i in n], [0 for i in n], [0 for i in n]
#see what nucleotide is at each index and augment the 
#same index of the respective list
def counter(Q):
    for q in Q:
        for k in range(len(q)):
            if q[k] == 'A':
                A[k] += 1
            elif q[k] == 'C':
                C[k] += 1
            elif q[k] == 'G':
                G[k] += 1
            elif q[k] == 'T':
                T[k] += 1
counter(seqs)
#find the max of all the counter lists at every index 
#and add the respective nucleotide to the consensus sequence
def consensus(a,t,c,g):
        consensus = ''
        for k in range(len(a)):
            if (a[k] > t[k]) and (a[k]>c[k]) and (a[k]>g[k]):
                consensus = consensus+"A"
            elif (t[k] > a[k]) and (t[k]>c[k]) and (t[k]>g[k]):
                consensus = consensus+ 'T'
            elif (c[k] > t[k]) and (c[k]>a[k]) and (c[k]>g[k]):
                consensus = consensus+ 'C'
            elif (g[k] > t[k]) and (g[k]>c[k]) and (g[k]>a[k]):
                consensus = consensus+ 'G'
            #ensure a nucleotide is added to consensus sequence
            #when more than one index has the max value
            else:
                if max(a[k],c[k],t[k],g[k]) in a:
                    consensus = consensus + 'A'
                elif max(a[k],c[k],t[k],g[k]) in c:
                    consensus = consensus + 'C'
                elif max(a[k],c[k],t[k],g[k]) in t:
                    consensus = consensus + 'T'
                elif max(a[k],c[k],t[k],g[k]) in g:
                    consensus = consensus + 'G'
        print(consensus)
        #debugging, ignore this --> print('len(consensus)',len(consensus))
consensus(A,T,C,G)
#debugging, ignore this --> print('len(A)',len(A))
print('A: ',*A, sep=' ')
print('C: ',*C, sep=' ')
print('G: ',*G, sep=' ')
print('T: ',*T, sep=' ')

感谢您的宝贵时间

  • 下面一行有错误:

    n = range(len(seqs[0])+1)

导致序列太长(包含一个额外的A和4倍的0)。取出+1,应该可以工作了。

  • 另外你的输出中有两个空格,在你的print语句中去掉:后面的空格。
  • 如果您修复了这两行,它将在示例中工作,但对于超过一行的序列将失败(就像在实际示例中一样)。

尝试将这些行与下面的代码合并:

new_seqs = list()
for s in seqs:
    if s.startswith('>'):
        new_seqs.append('')
    else:
        new_seqs[-1]+=s
seqs = new_seqs

最新更新