创建一个嵌套for循环的函数,获取DNA序列之间的所有重叠,生物信息学



我编写了函数get_overlap来评估所有之间的重叠在左右和左右两个方向上都有成对的读取。现在我要用这个函数来写一个函数get_all_overlaps,它必须返回:

字典的字典,指定一对的重叠碱基的数目Of以特定的左右方向读取。计算读到的重叠本身是没有意义的,不能包括在内。假设得到的字典是的字典被称为d,那么d['Read2']将是一个字典,其中键是对象中放入'Read2'时,与'Read2'有重叠的读的名称,这些键的值是重叠基的数量那些读。

示例用法:假设reads是read_data返回的字典,则:

get_all_overlaps(reads)

应该返回以下字典的字典(但不一定是相同的)键值对的排序):

{'Read1': {'Read3': 0, 'Read2': 1, 'Read5': 1, 'Read4': 0, 'Read6': 29},
'Read3': {'Read1': 0, 'Read2': 0, 'Read5': 0, 'Read4': 1, 'Read6': 1},
'Read2': {'Read1': 13, 'Read3': 1, 'Read5': 21, 'Read4': 0, 'Read6': 0},
'Read5': {'Read1': 39, 'Read3': 0, 'Read2': 1, 'Read4': 0, 'Read6': 14},
'Read4': {'Read1': 1, 'Read3': 1, 'Read2': 17, 'Read5': 2, 'Read6': 0},
'Read6': {'Read1': 0, 'Read3': 43, 'Read2': 0, 'Read5': 0, 'Read4': 1}}

下面是一个字典,其中键是读的名称,值是相关联的读取序列和我的代码get_overlap

read_map = {'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}

def get_overlap (left,right):
max_overlap = min(len(left), len(right))
for i in range(max_overlap):
ovl = max_overlap - I
if left[-ovl:] == right[:ovl]:
return left[-ovl:]
return ''

我从书中得到的提示:我必须使用get_overlap函数来找到a一对读数。为了生成所有读的组合,我需要两个for循环。一个循环Over在左位置读取,另一个(在第一个位置内)循环在右位置读取的位置。但是我们不希望读取本身有重叠,所以应该有在检查左读和右读是否相同时使用if语句

尽管我们得到了这些暗示,我不得不承认我仍然困惑和迷失。希望有人能帮助我:D

这看起来很像家庭作业问题。在寻求帮助之前,你应该先尝试解决问题,这样你会学到更多。

无论如何,这里有一个解决方案:

def get_all_overlaps(read_map):
result = {} # create an empty dictionary to put our results in
for key1, item1 in read_map.items(): # loop over all items in the map
result[key1] = {} # create an empty dictionary for this read
for key2, item2 in read_map.items(): # loop over all the items a second time to compare
if key1 == key2: # check if the reads are the same
continue # they are the same, skip this comparison
result[key1][key2] = len(get_overlap(item1,item2)) # compare overlaps and get the length
return result # return the result

相关内容

  • 没有找到相关文章

最新更新