创建字符串(DNA)中子字符串的位置列表(Python 3)



我正在学习生物信息学课程,我正试图编写一个函数来查找字符串中子字符串的所有出现。

def find_match(s, t):
"""Returns a list of all positions of a substring t in string s.
Takes two arguments: s & t.
"""
occurrences = []
for i in range(len(s)-len(t)+1): # loop over alignment
match = True
for j in range(len(t)): # loop over characters
if s[i+j] != t[j]:  # compare characters
match = False   # mismatch
break
if match:   # allchars matched
occurrences.append(i)
return(occurrences)

print(find_match("GATATATGCATATACTT", "ATAT")) # [1, 1, 1, 1, 3, 3, 3, 3, 5, 5, 9, 9, 9, 9, 11, 11, 11, 13]
print(find_match("AUGCUUCAGAAAGGUCUUACG", "U")) # [1, 4, 5, 14, 16, 17]

上面的输出应该与以下内容完全匹配:

[2,4,10]

[2,5,6,15,17,18]

我该怎么解决这个问题?最好不使用正则表达式。

看起来代码缩进得很糟糕,

if match:

必须在内部循环之外。

def find_match(s, t):
"""Returns a list of all positions of a substring t in string s.
Takes two arguments: s & t.
"""
occurrences = []
for i in range(len(s)-len(t)+1): # loop over alignment
match = True
for j in range(len(t)): # loop over characters
if s[i+j] != t[j]:  # compare characters
match = False   # mismatch
break
if match: # <--- This shouldn't be inside the inner for cycle
occurrences.append(i + 1)
return occurrences

print(find_match("GATATATGCATATACTT", "ATAT")) # [1, 1, 1, 1, 3, 3, 3, 3, 5, 5, 9, 9, 9, 9, 11, 11, 11, 13]
print(find_match("AUGCUUCAGAAAGGUCUUACG", "U")) # [1, 4, 5, 14, 16, 17]

您可以使用find

def find_match(s, t):
return list(set([s.find(t, i)+1 for i in range(len(s)-1) if s.find(t, i) != -1]))

输出:

In [1]: find_match("AUGCUUCAGAAAGGUCUUACG", "U")
Out[1]: [2, 5, 6, 15, 17, 18]
In [2]: find_match("GATATATGCATATACTT", 'ATAT')
Out[2]: [2, 10, 4]

Find将返回子字符串的位置。因此,遍历索引并将其传递到str.find方法中。如果子字符串不存在,find将返回-1。所以它需要过滤掉。

In [1]: "GATATATGCATATACTT".find('ATAT', 0)
Out[1]: 1

相关内容

  • 没有找到相关文章

最新更新