在Python中使用re.search(pattern，text)提取两个指定子字符串之间的子字符串

>我有一个字符串，比如，"ENST00000260682_3_4_5_6_7_8_9_BS_673.6".我必须在re.search()中使用正则表达式来提取子字符串并将其写入像这样的列表中，[3, 4, 5, 6, 7, 8, 9]，在 Python 中。

我试过了

text="ENST00000260682_3_4_5_6_7_8_9_BS_673.6"
pattern=re.compile(r"^[[A-Z0-9]*_[.*]_BS]")
a=re.search(pattern, text)
print(a.group())

它返回，'none'，也AttributeError: 'NoneType' object has no attribute 'group'.

请帮我解决这个问题。

搜索下划线后的所有数字_BS：

import re
text="ENST00000260682_3_4_5_6_7_8_9_BS_673.6"
pattern=re.compile(r"_(d+)")
a=re.findall(pattern, text[:text.find('_BS')])
print(a)

输出：['3', '4', '5', '6', '7', '8', '9']

或者，如果需要，将它们转换为 int：

a=[int(x) for x in re.findall(pattern, text[:text.find('_BS')])]

您可以使用生成器而不是正则表达式轻松实现此目的：

def num_gen(s, delimiter='_', start_index=1, stop_token='BS'):
# delimiter: the char you want to split your text for
# start_index: where your want to start retrieving values
# stop_token: stop retrieving when the token is encountered
for x in s.split(delimiter)[start_index:]:
if x != stop_token:
yield x
else:
return

用法：

t = "ENST00000260682_3_4_5_6_7_8_9_BS_673.6"
list(num_gen(t))
# ['3', '4', '5', '6', '7', '8', '9']

如果可能的话，我建议除非必要，否则避免使用正则表达式，特别是如果您不熟悉它。以下是相关引述：

有些人在遇到问题时会认为 "我知道，我会使用正则表达式。">
现在他们有两个问题。

正则表达式何时有用是有时间和空间的。但在此之前，不要不必要地将其添加为问题的一部分。

相关内容

最新更新

热门标签：