我创建了一个mutate_v1函数,它在DNA序列中产生随机突变。
def mutate_v1(sequence, mutation_rate):
dna_list = list(sequence)
for i in range(len(sequence)):
r = random.random()
if r < mutation_rate:
mutation_site = random.randint(0, len(dna_list) - 1)
dna_list[mutation_site] = random.choice(list('ATCG'))
return ''.join(dna_list)
如果我将函数应用于G0
的所有元素,我得到突变体的新一代(G1
) (一个突变序列列表)) .
G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
G1 = [mutate_v1(s,0.01) for s in G0]
#G1
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
如何将我的功能重复到20代(20代)?
我可以像下面这样手动执行
G1 = [mutate_v1(s,0.01) for s in G0]
G2 = [mutate_v1(s,0.01) for s in G1]
G3 = [mutate_v1(s,0.01) for s in G2]
G4 = [mutate_v1(s,0.01) for s in G3]
G5 = [mutate_v1(s,0.01) for s in G4]
G6 = [mutate_v1(s,0.01) for s in G5]
G7 = [mutate_v1(s,0.01) for s in G6]
但是我相信for循环会更好。我已经测试了几个代码,但没有结果。
有人能帮忙吗?使用range迭代到代数,并将每一代存储在列表中,每一代都是改变前一代的结果:
G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generations = [G0]
for _ in range(20):
previous_generation = generations[-1]
generations.append([mutate_v1(s, 0.01) for s in previous_generation])
# then you can access by index to a generation
print(generations[1]) # access generation 1
print(generations[20]) # access generation 20
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAT']
Dani的回答是一个很好的简单解决方案,但我想演示另一种方法,使用Python中稍微高级一点的编程技术,生成器函数:
def mutation_generator(g0):
g = g0.copy()
while True:
yield g
g = [mutate_v1(seq, 0.01) for seq in g]
现在,mutation_generator
是一个无限序列生成器,这意味着理论上你可以无限地继续进化你的序列。如果你想抓住20代:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
twenty_generations = [next(generation) for _ in range(20)]
这个生成器的好处是我们可以在任何时候重新启动它。假设你对前20代做了一些分析,现在你想看看接下来的100代会发生什么:
next_hundred = [next(generation) for _ in range(100)]
现在,我们可以初始化一个新的生成器,使用twenty_generations
的最后一代作为新生成器的初始代,但这不是必要的,因为我们的generation
生成器只是在20代时停止,并且随时准备在调用next(generation)
时继续突变。
这打开了很多可能性,包括发送新的突变率参数,甚至,如果你想,全新的突变函数。真的,你想要什么都行。
这里的另一个好处是,您可以在相同的初始序列上运行多个生成器,并观察它们如何发散。注意,这完全可以用更传统的方法在函数中使用for
循环,但使用生成器的好处是,你不必一次生成整个序列;它只有在你告诉它(通过next()
)时才会变异。例如:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
universe_1 = mutation_generator(g0)
universe_2 = mutation_generator(g0)
universe_3 = mutation_generator(g0)
# The first generation is always the same as g0, but this can be modified if you desire
next(universe_1)
next(universe_2)
next(universe_3)
# Compare the first mutation without having to calculate twenty generations in each 'universe' before getting back results
first_mutation_u1 = next(universe_1)
first_mutation_u2 = next(universe_2)
first_mutation_u3 = next(universe_3)
同样,您还可以修改生成器函数mutation_generator
以接受其他参数,如自定义突变函数,甚至可以随时更改突变率等。
最后,作为旁注,使用生成器可以很容易地跳过数千代,而无需在内存中存储多个序列:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
for _ in range(10000):
next(generation)
print(g0) # first gen
print(next(generation)) # ten thousand generations later
输出:
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['TTGGA', 'CTTCG', 'TGTGA', 'TAACA', 'CATCG']
使用基于for
循环的方法,您将不得不创建并存储所有10000代(浪费大量内存),或者修改Dani回答中的代码,使其表现得更像生成器(但没有好处!)。
Real Python有一篇关于生成器的好文章,如果你想了解更多。当然,也可以查看文档。