seq = 'TGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTG'
for nucleotide in seq:
if nucleotide == 'A':
a_nt = seq.count('A')
elif nucleotide == 'G':
g_nt = seq.count('G')
elif nucleotide == 'C':
c_nt = seq.count('C')
elif nucleotide == 'T':
t_nt = seq.count('T')
elif nucleotide == 'N':
n_nt = seq.count('N')
else:
sys.exit("Did not code")
print(a_nt, g_nt, c_nt, t_nt, n_nt)
错误:
NameError: name 'n_nt' is not defined. Did you mean: 'a_nt'?
如果核苷酸不在"AGCTN"中,则sys.exit("no this code")
。即使N的计数为零,也应该打印出来。
如果我打印出a、g、c和t,效果会很好。但n_nt
不起作用。
只需在没有for
循环的情况下计算所有变量,然后设置所有变量,即使为零:
seq = 'TGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTG'
a_nt = seq.count('A')
g_nt = seq.count('G')
c_nt = seq.count('C')
t_nt = seq.count('T')
n_nt = seq.count('N')
print(a_nt, g_nt, c_nt, t_nt, n_nt)
# or more efficient
from collections import Counter
counts = Counter(seq)
for letter in 'AGCTN':
print(counts[letter], end=' ')
输出:
11 11 10 7 0
11 11 10 7 0
我建议使用collections.Counter
from collections import Counter
possible_nucleotides = ["A", "G", "C", "N", "T"]
seq = "TGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTG"
seq_counts = Counter(seq)
missing_nucleotides = {x: 0 for x in set(possible_nucleotides) - set(seq_counts.keys())}
seq_counts.update(missing_nucleotides)
则seq_counts
将如下所示:
Counter({'G': 11, 'A': 11, 'C': 10, 'T': 7, 'N': 0})
请记住,更新Counter
完全是可选的,因为如果不存在,尝试访问特定密钥将返回0