计算正则表达式的每个部分匹配的内容



我有几百个(相当简单)正则表达式及其在大量序列中的匹配。我希望能够告诉每个正则表达式的哪个部分与目标序列中的哪个位置匹配。例如,以下正则表达式"[DSTE][^P][^DEWHFYC]D[GSAN]"可以由以下序列中的位置4到8匹配:

ABCSGADAZZZ

我想(通过编程)得到的是,对于每个正则表达式,1)正则表达式的每个"部分",2)目标序列中与其匹配的位置:

[DSTE] -- (3, 4),
[^P] -- (4, 5),
[^DEWHFYC] -- (5, 6),
D -- (6, 7),
[GSAN] -- (7, 8)

我发现了这个网站,它基本上做了我想要的:https://regex101.com/,但我不确定我必须深入regex解析才能在自己的代码中做到这一点(我使用的是Python和R)。

它仍然不是100%,但我在数据集的3365/3510上返回了输出。我检查过的少数人排成了一排:)

我的github(链接如下)中包括csv、txt(分别)中的输入和输出。

请忽略全局变量;我正在考虑切换代码,看看速度是否有明显的提高,但没有绕过它

目前,该版本在交替和开始/结束行运算符(^$)的操作顺序方面存在问题,如果它们是字符串开头或结尾的交替选项。我很有信心这与先例有关;但我没能把它组织得足够好。

对代码的函数调用在最后一个单元格中。而不是使用运行整个DataFrame

for x in range(len(df)):
try:
df_expression = df.iloc[x, 2]
df_subsequence = df.iloc[x, 1]
# call function
identify_submatches(df_expression, df_subsequence)
print(dataframe_counting)
dataframe_counting += 1
except:
pass

您可以通过将一个模式和相应的序列传递给以下函数来轻松地一次测试一个:

p = ''
s = ''
identify_submatches(p, s)

代码:https://github.com/jameshollisandrew/just_for_fun/blob/master/motif_matching/motif_matching_02.ipynb

输入:https://github.com/jameshollisandrew/just_for_fun/blob/master/motif_matching/elm_compiled_ss_re.csv

输出:https://github.com/jameshollisandrew/just_for_fun/blob/master/motif_matching/motif_matching_02_outputs.txt

"""exp_a as input expression
sub_a as input subject string"""
input_exp = exp_a
input_sub = sub_a
m_gro = '^*((?:[^()]+|(?R))*+)({.+?})*$*'
m_set = '^*[.+?]({.+?})*$*'
m_alt = '|'
m_lit = '^*[.w]({.+?})*$*|$'

# PRINTOUT
if (print_type == 1):
print('nExpression Input: {}nSequence Input: {}'.format(exp_a, sub_a))
if (print_type == 3):
print('nnSTART ITERATIONnINPUTSn  exp: {}n  seq: {}'.format(exp_a, sub_a))

# return the pattern match (USE IF SUB IS NOT MATCHED PRIMARY)
if r.search(exp_a, sub_a) is not None:
m = r.search(exp_a, sub_a)
sub_a = m.group()
# >>>PRINTOUT<<<
if print_type == 3:
print('nSEQUENCE TYPE Mn  exp: {}n  seq: {}'.format(exp_a, sub_a))
elif m is None:
print('Search expression: {} in sequence: {} returned no matches.nn'.format(exp_a, sub_a))
return None
if (print_type == 1):
print('Subequence Match: {}'.format(sub_a))

# check if main expression has unnested alternation
if len(alt_states(exp_a)) > 0:
# returns matching alternative
exp_a = alt_evaluation(exp_a, sub_a)
# >>>PRINTOUT<<<
if print_type == 3:
print('nALTERNATION RETURNn  exp: {}n  seq: {}'.format(exp_a, sub_a))

# get initial expression list
exp_list = get_states(exp_a)

# count possible expression constructions
status, matched_tuples = finite_state(exp_list, sub_a)
# >>>PRINTOUT<<<
if print_type == 3:
print('nCONFIRM EXPRESSIONn  exp: {}'.format(matched_tuples))

# index matches
indexer(input_exp, input_sub, matched_tuples)

def indexer(exp_a, sub_a, matched_tuples):
sub_length = len(sub_a)
sub_b = r.search(exp_a, sub_a)
adj = sub_b.start()
sub_b = sub_b.group()
print('')
for pair in matched_tuples:
pattern, match = pair
start = adj
adj = adj + len(match)
end = adj
index_pos = (start, end)
sub_b = slice_string(match, sub_b)
print('t{}t{}'.format(pattern, index_pos))
def strip_nest(s):
s = s[1:]
s = s[:-1]
return s
def slice_string(p, s):
pat = p
string = s
# handles escapes
p = r.escape(p)
# slice the input string on input pattern
s = r.split(pattern = p, string = s, maxsplit = 1)[1]

# >>>PRINTOUT<<<
if print_type == 4:
print('nSLICE STRINGn  pat: {}n  str: {}n  slice: {}'.format(pat, string, s))

return s
def alt_states(exp):
# check each character in string
idx = 0 # index tracker
op = 0 # open parenth
cp = 0 # close parenth
free_alt = [] # amend with index position of unnested alt
for c in exp:
if c == '(':
op += 1
elif c == ')':
cp += 1
elif c == '|':
if op == cp:
free_alt.append(idx)
if idx < len(exp)-1:
idx+=1
# split string if found
alts = []
if free_alt:
_ = 0
for i in free_alt:
alts.append(exp[_:i])
alts.append(exp[i+1:])
# the truth value of this check can be checked against the length of the return
# len(free_alt) > 0 means unnested "|" found
return alts

def alt_evaluation(exp, sub):
# >>>PRINTOUT<<<
if print_type == 3:
print('nALTERNATION SELECTIONn  EXP: {}n  SEQ: {}'.format(exp, sub))
# gets alt index position
alts = alt_states(exp)
# variables for eval
a_len = 0 # length of alternate match
keep_len = 0 # length of return match
keep = '' # return match string
# evaluate alternatives
for alt in alts:
m = r.search(alt, sub)
if m is not None:
a_len = len(m.group())                             # length of match string
# >>>PRINTOUT<<<
if print_type == 3:
print('  pat: {}n  str: {}n  len: {}'.format(alt, m.group(0), len(m.group(0))))
if a_len >= keep_len:                              
keep_len = a_len                               # sets alternate length to keep length
exp = alt                                     # sets alt as keep variable
# >>>PRINTOUT<<<
if print_type == 3:
print('  OUT: {}'.format(exp))                
return exp
def get_states(exp):
"""counts number of subexpressions to be checked
creates FSM"""
# >>>PRINTOUT<<<
if print_type == 3:
print('nGET STATESn  EXP: {}'.format(exp))
# List of possible subexpression regex matches
m_gro = '^*((?:[^()]+|(?R))*+)({.+?})*$*'
m_set = '^*[.+?]({.+?})*$*'
m_alt = '|'
m_lit = '^*[.w]({.+?})*$*|$'

# initialize capture list
exp_list = []
# loop through first level of subexpressions: 
while exp != '':
if r.match(m_gro, exp):
_ = r.match(m_gro, exp).group(0)
exp_list.append(_)
exp = slice_string(_, exp)
elif r.match(m_set, exp):
_ = r.match(m_set, exp).group(0)
exp_list.append(_)
exp = slice_string(_, exp)

elif r.match(m_alt, exp):
_ = ''
elif r.match(m_lit, exp):
_ = r.match(m_lit, exp).group(0)
exp_list.append(_)
exp = slice_string(_, exp)
else:
print('ERROR getting states')
break
n_states = len(exp_list)

# >>>PRINTOUT<<<
if print_type == 3:
print('GET STATES OUTn  states:n  {}n  # of states: {}'.format(exp_list, n_states))

return exp_list

def finite_state(exp_list, seq, level = 0, pattern_builder = '', iter_count = 0, pat_match = [], seq_match = []):

# >>>PRINTOUT<<<
if (print_type == 3):
print('nSTARTING MACHINEn  EXP: {}n  SEQ: {}n  LEVEL: {}n  matched: {}n  pat_match: {}'.format(exp_list, seq, level, pattern_builder, pat_match))

# patterns
m_gro = '^*((?:[^()]+|(?R))*+)({.+?})*$*'
m_set = '^*[.+?]({.+?})*$*'
m_alt = '|'
m_squ = '{(.),(.)}'
m_lit = '^*[.w]({.+?})*$*|$'

# set state, n_state
state = 0
n_states = len(exp_list)
#save_state = []
#save_expression = []

# temp exp
local_seq = seq
# >>>PRINTOUT<<<
if print_type == 3:
print('n  >>>MACHINE START')

# set failure cap so no endless loop
failure_cap = 1000
# since len(exp_list) returns + 1 over iteration (0 index) use the last 'state' as success state
while state != n_states:
for exp in exp_list:
# iterations
iter_count+=1
# >>>PRINTOUT<<<
if print_type == 3:
print('  iteration count: {}'.format(iter_count))
# >>>PRINTOUT<<<
if print_type == 3:
print('n  evaluating: {}n  for string: {}'.format(exp, local_seq))

# alternation reset
if len(alt_states(exp)) > 0:
# get operand options
operands = alt_states(exp)               
# create temporary exp list
temp_list = exp_list[state+1:]
# add level
level = level + 1                   
# >>>PRINTOUT<<<
if print_type == 3:
print('  ALT MATCH: {}n  state: {}n  opers returned: {}n  level in: {}'.format(exp, state, operands, level))
# compile local altneration
for oper in operands:
# get substates
_ = get_states(oper)
# compile list
oper_list = _ + temp_list
# send to finite_state, sublevel                    
alt_status, pats = finite_state(oper_list, local_seq, level = level, pattern_builder=pattern_builder, iter_count=iter_count, pat_match=pat_match)
if alt_status == 'success':
return alt_status, pats

# group cycle
elif r.match(m_gro, exp) is not None:
# get operand options
operands = group_states(exp)
# create temporary exp list
temp_list = exp_list[state+1:]
# add level
level = level + 1
# >>>PRINTOUT<<<
if print_type == 3:
print('  GROUP MATCH: {}n  state: {}n  opers returned: {}n  level in: {}'.format(exp, state, operands, level))
# compile local
oper_list = operands + temp_list
# send to finite_state, sublevel
group_status, pats = finite_state(oper_list, local_seq, level=level, pattern_builder=pattern_builder, iter_count=iter_count, pat_match=pat_match)
if group_status == 'success':
return group_status, pats

# quantifier reset
elif r.search(m_squ, exp) is not None:
# get operand options
operands = quant_states(exp)
# create temporary exp list
temp_list = exp_list[state+1:]
# add level
level = level + 1
# >>>PRINTOUT<<<
if print_type == 3:
print('  QUANT MATCH: {}n  state: {}n  opers returned: {}n  level in: {}'.format(exp, state, operands, level))
# compile local
for oper in reversed(operands):
# compile list
oper_list = [oper] + temp_list
# send to finite_state, sublevel
quant_status, pats = finite_state(oper_list, local_seq, level=level, pattern_builder=pattern_builder, iter_count=iter_count, pat_match=pat_match)
if quant_status == 'success':
return quant_status, pats

# record literal
elif r.match(exp, local_seq) is not None:
# add to local pattern
m = r.match(exp, local_seq).group(0)
local_seq = slice_string(m, local_seq)
# >>>PRINTOUT<<<
if print_type == 3:
print('  state transition: {}n  state {} ==> {} of {}'.format(exp, state, state+1, n_states))
# iterate state for match
pattern_builder = pattern_builder + exp
pat_match = pat_match + [(exp, m)]
state += 1
elif r.match(exp, local_seq) is None:
# >>>PRINTOUT<<<
if print_type == 3:
print('  Return FAIL on {}, level: {}, state: {}'.format(exp, level, state))
status = 'fail'
return status, pattern_builder

# machine success
if state == n_states:
# >>>PRINTOUT<<<
if print_type == 3:
print('  MACHINE SUCCESSn  level: {}n  state: {}n  exp: {}'.format(level, state, pattern_builder))
status = 'success'
return status, pat_match
# timeout
if iter_count == failure_cap:
state = n_states
# >>>PRINTOUT<<<
if print_type == 3:
print('===============nFAILURE CAP METn  level: {}n  exp state: {}n==============='.format(level, state))
break
def group_states(exp):

# patterns
m_gro = '^*((?:[^()]+|(?R))*+)({.+?})*$*'
m_set = '^*[.+?]({.+?})*$*'
m_alt = '|'
m_squ = '{(.),(.)}'
m_lit = '^*[.w]({.+?})*$*'
ret_list = []
# iterate over groups
groups = r.finditer(m_gro, exp)
for gr in groups:
_ = strip_nest(gr.group())      
# alternation reset
if r.search(m_alt, _):
ret_list.append(_)
else:
_ = get_states(_)
for thing in _:
ret_list.append(thing)
return(ret_list)
def quant_states(exp):

# >>>PRINTOUT<<<
if print_type == 4:
print('nGET QUANT STATESn  EXP: {}'.format(exp))
squ_opr = '(.+){.,.}'
m_squ = '{(.),(.)}'
# create states
states_list = []    
# get operand
operand_obj = r.finditer(squ_opr, exp)
for match in operand_obj:
operand = match.group(1)
# get repetitions
fa = r.findall(m_squ, exp)
for m, n in fa:
# loop through range
for x in range(int(m), (int(n)+1)):
# construct string
_ = operand + '{' + str(x) + '}'
# append to list
states_list.append(_)
# >>>PRINTOUT<<<
if print_type == 4:
print('  QUANT OUT: {}n'.format(states_list))
return states_list
%%time
print_type = 1
"""0:    
1: includes input
2: 
3: all output prints on """

dataframe_counting = 0
for x in range(len(df)):
try:
df_expression = df.iloc[x, 2]
df_subsequence = df.iloc[x, 1]
# call function
identify_submatches(df_expression, df_subsequence)
print(dataframe_counting)
dataframe_counting += 1
except:
pass

输出返回示例

输出值(即子表达式和索引集)由制表符分隔

Expression Input: [KR]{1,4}[KR].[KR]W.
Sequence Input: TRQARRNRRRRWRERQRQIH
Subequence Match: RRRRWR
[KR]{1} (7, 8)
[KR]    (8, 9)
.   (9, 10)
[KR]    (10, 11)
W   (11, 12)
.   (12, 13)
2270
Expression Input: [KR]{1,4}[KR].[KR]W.
Sequence Input: TASQRRNRRRRWKRRGLQIL
Subequence Match: RRRRWK
[KR]{1} (7, 8)
[KR]    (8, 9)
.   (9, 10)
[KR]    (10, 11)
W   (11, 12)
.   (12, 13)
2271
Expression Input: [KR]{1,4}[KR].[KR]W.
Sequence Input: TRKARRNRRRRWRARQKQIS
Subequence Match: RRRRWR
[KR]{1} (7, 8)
[KR]    (8, 9)
.   (9, 10)
[KR]    (10, 11)
W   (11, 12)
.   (12, 13)
2272
Expression Input: [KR]{1,4}[KR].[KR]W.
Sequence Input: LDFPSKKRKRSRWNQDTMEQ
Subequence Match: KKRKRSRWN
[KR]{4} (5, 9)
[KR]    (9, 10)
.   (10, 11)
[KR]    (11, 12)
W   (12, 13)
.   (13, 14)
2273
Expression Input: [KR]{1,4}[KR].[KR]W.
Sequence Input: ASQPPSKRKRRWDQTADQTP
Subequence Match: KRKRRWD
[KR]{2} (6, 8)
[KR]    (8, 9)
.   (9, 10)
[KR]    (10, 11)
W   (11, 12)
.   (12, 13)
2274
Expression Input: [KR]{1,4}[KR].[KR]W.
Sequence Input: GGATSSARKNRWDETPKTER
Subequence Match: RKNRWD
[KR]{1} (7, 8)
[KR]    (8, 9)
.   (9, 10)
[KR]    (10, 11)
W   (11, 12)
.   (12, 13)
2275
Expression Input: [KR]{1,4}[KR].[KR]W.
Sequence Input: PTPGASKRKSRWDETPASQM
Subequence Match: KRKSRWD
[KR]{2} (6, 8)
[KR]    (8, 9)
.   (9, 10)
[KR]    (10, 11)
W   (11, 12)
.   (12, 13)
2276
Expression Input: [VMILF][MILVFYHPA][^P][TASKHCV][AVSC][^P][^P][ILVMT][^P][^P][^P][LMTVI][^P][^P][LMVCT][ILVMCA][^P][^P][AIVLMTC]
Sequence Input: LLNAATALSGSMQYLLNYVN
Subequence Match: LLNAATALSGSMQYLLNYV
[VMILF] (0, 1)
[MILVFYHPA] (1, 2)
[^P]    (2, 3)
[TASKHCV]   (3, 4)
[AVSC]  (4, 5)
[^P]    (5, 6)
[^P]    (6, 7)
[ILVMT] (7, 8)
[^P]    (8, 9)
[^P]    (9, 10)
[^P]    (10, 11)
[LMTVI] (11, 12)
[^P]    (12, 13)
[^P]    (13, 14)
[LMVCT] (14, 15)
[ILVMCA]    (15, 16)
[^P]    (16, 17)
[^P]    (17, 18)
[AIVLMTC]   (18, 19)
2277
Expression Input: [VMILF][MILVFYHPA][^P][TASKHCV][AVSC][^P][^P][ILVMT][^P][^P][^P][LMTVI][^P][^P][LMVCT][ILVMCA][^P][^P][AIVLMTC]
Sequence Input: IFEASKKVTNSLSNLISLIG
Subequence Match: IFEASKKVTNSLSNLISLI
[VMILF] (0, 1)
[MILVFYHPA] (1, 2)
[^P]    (2, 3)
[TASKHCV]   (3, 4)
[AVSC]  (4, 5)
[^P]    (5, 6)
[^P]    (6, 7)
[ILVMT] (7, 8)
[^P]    (8, 9)
[^P]    (9, 10)
[^P]    (10, 11)
[LMTVI] (11, 12)
[^P]    (12, 13)
[^P]    (13, 14)
[LMVCT] (14, 15)
[ILVMCA]    (15, 16)
[^P]    (16, 17)
[^P]    (17, 18)
[AIVLMTC]   (18, 19)
2278
Expression Input: [VMILF][MILVFYHPA][^P][TASKHCV][AVSC][^P][^P][ILVMT][^P][^P][^P][LMTVI][^P][^P][LMVCT][ILVMCA][^P][^P][AIVLMTC]
Sequence Input: IYEKAKEVSSALSKVLSKID
Subequence Match: IYEKAKEVSSALSKVLSKI
[VMILF] (0, 1)
[MILVFYHPA] (1, 2)
[^P]    (2, 3)
[TASKHCV]   (3, 4)
[AVSC]  (4, 5)
[^P]    (5, 6)
[^P]    (6, 7)
[ILVMT] (7, 8)
[^P]    (8, 9)
[^P]    (9, 10)
[^P]    (10, 11)
[LMTVI] (11, 12)
[^P]    (12, 13)
[^P]    (13, 14)
[LMVCT] (14, 15)
[ILVMCA]    (15, 16)
[^P]    (16, 17)
[^P]    (17, 18)
[AIVLMTC]   (18, 19)
2279
Expression Input: [VMILF][MILVFYHPA][^P][TASKHCV][AVSC][^P][^P][ILVMT][^P][^P][^P][LMTVI][^P][^P][LMVCT][ILVMCA][^P][^P][AIVLMTC]
Sequence Input: IYKAAKDVTTSLSKVLKNIN
Subequence Match: IYKAAKDVTTSLSKVLKNI
[VMILF] (0, 1)
[MILVFYHPA] (1, 2)
[^P]    (2, 3)
[TASKHCV]   (3, 4)
[AVSC]  (4, 5)
[^P]    (5, 6)
[^P]    (6, 7)
[ILVMT] (7, 8)
[^P]    (8, 9)
[^P]    (9, 10)
[^P]    (10, 11)
[LMTVI] (11, 12)
[^P]    (12, 13)
[^P]    (13, 14)
[LMVCT] (14, 15)
[ILVMCA]    (15, 16)
[^P]    (16, 17)
[^P]    (17, 18)
[AIVLMTC]   (18, 19)
2280

数据来源:ELM(蛋白质功能位点的真核线性基序资源)2020。检索自http://elm.eu.org/searchdb.html

如果您想提取正则表达式的每个部分匹配的字符串的位置,那么您应该用()覆盖它们,使每个片段都成为捕获组。如果不这样做,您将无法分析regex的每个部分匹配的位置。

([DSTE])([^P])([^DEWHFYC])(D)([GSAN])

现在,您可以看到每个部分都是分开的。因此,正则表达式的每个部分都可以使用另一个正则表达式提取

((.*?)(?=)(?:(|$))

奖励:您还可以提取正则表达式的每个部分匹配的文本的部分。

因此,使用re.search(pattern, text, flags = 0)方法来获得所需的数据,如下

import re
text = 'ABCSGADAZZZ'
theRegex = r'([DSTE])([^P])([^DEWHFYC])(D)([GSAN])'
r1 = re.compile(r'((.*?)(?=)(?:(|$))') # each part extractor
r2 = re.compile(theRegex) # your regex
grps = r1.findall(theRegex) # parts of regex
m = re.search(r2, text)
for i in range(len(grps)):
print( 'Regex: {} | Match: {} | Range: {}'.format(grps[i], m.group(i+1), m.span(i+1)) )

实际示例

使用stringr包,您应该能够像这样组合:

> stringr::str_match_all(string = "ABCSGADAZZZ",
pattern = "[DSTE][^P][^DEWHFYC]D[GSAN]")
[[1]]
[,1]   
[1,] "SGADA"
> stringr::str_locate_all(string = "ABCSGADAZZZ",
pattern = "[DSTE][^P][^DEWHFYC]D[GSAN]")
[[1]]
start end
[1,]     4   8

然后组合函数输出或编写一个简单的包装函数

我从未见过在其API中公开具有此类功能的regex引擎。或者没有意识到这样的API。也许有一个,但在RPython中没有必要。

但不管怎样,它并不像我认为的那么简单。

考虑正则表达式/(a(b*))*/而不是"abbabbb"——b*部分匹配的不仅仅是一个子字符串。相反,可以有一个子字符串与某个正则表达式的多个部分匹配。

即使您的正则表达式"相当简单"。。。它们真的像问题中的那个一样简单吗?

正如其他人已经提到的,您可以使用捕获组来找出哪个组匹配什么,但为此,您需要自己编辑正则表达式并跟踪组的独立性。或者,是的,编写自己的解析器。因为正则表达式不能解析正则表达式——它们对于自己的语法来说不够强大。

好吧,也许有一种方法可以自动轻松地解析和修改所有正则表达式(添加捕获组),如果它们真的很简单并且或多或少是统一的。但是,如果只举一个正则表达式的例子,就无法判断。

但可能你问错了问题:https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/

许多海报陷入的一个陷阱是问如何实现一些"小"目标,但永远不要说大目标是什么。通常,小目标要么不可能,要么很少是个好主意——相反,需要一种不同的方法

更新:

我对您的示例字符串和正则表达式进行了一些更改,以说明您在评论中提到的P{1,3}情况

以下是修改regexes并获得所需输出的代码:

import re
orig_re = "[DSTE]{1,1}[^P][^DEWHFYC]DP{1,3}[GSAN]"
mod_re = r'(([.*?]|.)({.*?})?)'
groups = re.findall(mod_re, orig_re)
print("parts of regex:", [g[0] for g in groups])
new_regex_str = re.sub(mod_re, r'(1)', orig_re)
print("new regex with capturing groups:", new_regex_str)
new_re = re.compile(new_regex_str)
str = "ABCSGADPPAZZZSGADPA"
matches = new_re.finditer(str)
for m in matches:
print( '----------')
for g in range(len(groups)):
print('#{}: {} -- {}'.format(g, groups[g][0], m.span(g+1)))

它会给你:

parts of regex: ['[DSTE]{1,1}', '[^P]', '[^DEWHFYC]', 'D', 'P{1,3}', '[GSAN]']
new regex with capturing groups: ([DSTE]{1,1})([^P])([^DEWHFYC])(D)(P{1,3})([GSAN])
----------
#0: [DSTE]{1,1} -- (3, 4)
#1: [^P] -- (4, 5)
#2: [^DEWHFYC] -- (5, 6)
#3: D -- (6, 7)
#4: P{1,3} -- (7, 9)
#5: [GSAN] -- (9, 10)
----------
#0: [DSTE]{1,1} -- (13, 14)
#1: [^P] -- (14, 15)
#2: [^DEWHFYC] -- (15, 16)
#3: D -- (16, 17)
#4: P{1,3} -- (17, 18)
#5: [GSAN] -- (18, 19)

也在JS 中

const orig_re = "[DSTE]{1,1}[^P][^DEWHFYC]DP{1,3}[GSAN]"
const mod_re = /(([.*?]|.)({.*?})?)/g
groups = [...orig_re.matchAll(mod_re)].map(g=>g[0])
console.log("parts of regex:", groups)
const new_re = orig_re.replace(mod_re, "($1)")
console.log("new regex with capturing groups:", new_re)
const str = "ABCSGADPPAZZZSGADPA"
matches = str.matchAll(new_re)
for(const m of matches) {
console.log('----------')
let pos = m.index
groups.forEach((g,i) => console.log(`#${i}: ${g} -- (${pos},${pos += m[i+1].length})`))
}

最新更新