我试图找到在字符序列内找到特定字符串模式的次数;对于生物信息学爱好者来说,它实际上是找到一个特定基序在基因组中出现的次数。为此,我找到了以下基于Python的函数:
def search(pat, txt):
M = len(pat)
N = len(txt)
for i in range(N - M + 1):
j = 0
while(j < M):
if (txt[i + j] != pat[j]):
break
j += 1
if (j == M):
print(f"{pat} found at index ", i)
返回这样的结果:
GAATC found at index 1734
GAATC found at index 2229
GAATC found at index 2363
GAATC found at index 2388
GAATC found at index 2399
GAATC found at index 2684
GAATC found at index 5634
GAATC found at index 7021
GAGTC found at index 1671
GAGTC found at index 4043
依此类推。正如你所看到的图案(主题)"GAATC"出现了8次,每出现一个位置就重复一次。我希望在终端上有这样的内容:
GAATC found 8 times
GAGTC found 2 times
以此类推。在标题中我写了"创建一本词典"。我认为这是最好的选择,但我愿意接受所有可能的建议。
你能帮我吗?谢谢你!
def search(pat, txt):
M = len(pat)
N = len(txt)
found = 0
for i in range(N - M + 1):
if (txt[i:i+M] == pat):
found += 1
print(f"{pat} found {found} times.")
或者使用正则表达式…
import re
def search(pat, txt):
found = len(re.findall(f'(?={pat})', text))
print(f"{pat} found {found} times.")