我想使用sum函数来计算特定字符的多次出现次数,但我的脚本只适用于一个字符



这个脚本应该计算蛋白质的总重量,所以我决定计算脚本中某些字符的出现次数。然而,只有第一个方程产生的结果会导致总重量相同(第一个方程下的所有值都等于零,这绝对是不正确的(。我如何让我的剧本注意到其他的台词???这是一个缩写版本:

akt3_file = open('AKT3 fasta.txt', 'r') #open up the fasta file
for line in akt3_file:
ala =(sum(line.count('A') for line in akt3_file)*89*1000) #this value is 1780000
arg =(sum(line.count('R') for line in akt3_file)*174*1000)
asn =(sum(line.count('N') for line in akt3_file)*132*1000)
asp =(sum(line.count('D') for line in akt3_file)*133*1000)
asx =(sum(line.count('B') for line in akt3_file)*133*1000)
protein_weight = ala+arg+asn+asp+asx
print(protein_weight) # the problem is that this value is also 1780000
akt3_file.close() #close the fasta file

您遇到的问题是,您试图多次迭代文件的行。虽然这实际上是可能的(与大多数迭代器不同,文件对象可以用seek重新缠绕(,但您做得不好,所以除了第一次迭代之外,所有迭代都看不到任何数据。

在这种情况下,您可能根本不需要对行进行迭代。只需将文件的全文读入一个字符串,并计算出您想要从该字符串中提取的字符:

with open('AKT3 fasta.txt', 'r') as akt_3file:  # A with is not necessary, but a good idea.
data = akt_3file.read()        # Read the whole file into the data string.
ala = data.count('A') * 89 * 1000  # Now we can count all the occurrences in all lines at
arg = data.count('R') * 174 * 1000 # once, and there's no issue iterating the file, since
asn = data.count('N') * 132 * 1000 # we're not dealing with the file any more, just the
asp = data.count('D') * 133 * 1000 # immutable data string.
asx = data.count('B') * 133 * 1000

最新更新