我编写了函数get_overlap
来评估所有之间的重叠在左右和左右两个方向上都有成对的读取。现在我要用这个函数来写一个函数get_all_overlaps
,它必须返回:
字典的字典,指定一对的重叠碱基的数目Of以特定的左右方向读取。计算读到的重叠本身是没有意义的,不能包括在内。假设得到的字典是的字典被称为d,那么d['Read2']将是一个字典,其中键是对象中放入'Read2'时,与'Read2'有重叠的读的名称,这些键的值是重叠基的数量那些读。
示例用法:假设reads是read_data返回的字典,则:
get_all_overlaps(reads)
应该返回以下字典的字典(但不一定是相同的)键值对的排序):
{'Read1': {'Read3': 0, 'Read2': 1, 'Read5': 1, 'Read4': 0, 'Read6': 29},
'Read3': {'Read1': 0, 'Read2': 0, 'Read5': 0, 'Read4': 1, 'Read6': 1},
'Read2': {'Read1': 13, 'Read3': 1, 'Read5': 21, 'Read4': 0, 'Read6': 0},
'Read5': {'Read1': 39, 'Read3': 0, 'Read2': 1, 'Read4': 0, 'Read6': 14},
'Read4': {'Read1': 1, 'Read3': 1, 'Read2': 17, 'Read5': 2, 'Read6': 0},
'Read6': {'Read1': 0, 'Read3': 43, 'Read2': 0, 'Read5': 0, 'Read4': 1}}
下面是一个字典,其中键是读的名称,值是相关联的读取序列和我的代码get_overlap
read_map = {'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}
def get_overlap (left,right):
max_overlap = min(len(left), len(right))
for i in range(max_overlap):
ovl = max_overlap - I
if left[-ovl:] == right[:ovl]:
return left[-ovl:]
return ''
我从书中得到的提示:我必须使用get_overlap函数来找到a一对读数。为了生成所有读的组合,我需要两个for循环。一个循环Over在左位置读取,另一个(在第一个位置内)循环在右位置读取的位置。但是我们不希望读取本身有重叠,所以应该有在检查左读和右读是否相同时使用if语句
尽管我们得到了这些暗示,我不得不承认我仍然困惑和迷失。希望有人能帮助我:D
这看起来很像家庭作业问题。在寻求帮助之前,你应该先尝试解决问题,这样你会学到更多。
无论如何,这里有一个解决方案:
def get_all_overlaps(read_map):
result = {} # create an empty dictionary to put our results in
for key1, item1 in read_map.items(): # loop over all items in the map
result[key1] = {} # create an empty dictionary for this read
for key2, item2 in read_map.items(): # loop over all the items a second time to compare
if key1 == key2: # check if the reads are the same
continue # they are the same, skip this comparison
result[key1][key2] = len(get_overlap(item1,item2)) # compare overlaps and get the length
return result # return the result