用朴素算法创建字典进行模式搜索



我试图找到在字符序列内找到特定字符串模式的次数;对于生物信息学爱好者来说,它实际上是找到一个特定基序在基因组中出现的次数。为此,我找到了以下基于Python的函数:

def search(pat, txt):
M = len(pat)
N = len(txt)

for i in range(N - M + 1):
j = 0 
while(j < M):
if (txt[i + j] != pat[j]):
break
j += 1

if (j == M):
print(f"{pat} found at index ", i)

返回这样的结果:

GAATC found at index  1734
GAATC found at index  2229
GAATC found at index  2363
GAATC found at index  2388
GAATC found at index  2399
GAATC found at index  2684
GAATC found at index  5634
GAATC found at index  7021
GAGTC found at index  1671
GAGTC found at index  4043

依此类推。正如你所看到的图案(主题)"GAATC"出现了8次,每出现一个位置就重复一次。我希望在终端上有这样的内容:

GAATC found 8 times
GAGTC found 2 times

以此类推。在标题中我写了"创建一本词典"。我认为这是最好的选择,但我愿意接受所有可能的建议。

你能帮我吗?谢谢你!

def search(pat, txt):
M = len(pat)
N = len(txt)

found = 0
for i in range(N - M + 1):
if (txt[i:i+M] == pat):
found += 1
print(f"{pat} found {found} times.")

或者使用正则表达式…

import re
def search(pat, txt):
found = len(re.findall(f'(?={pat})', text))
print(f"{pat} found {found} times.")

相关内容

  • 没有找到相关文章

最新更新