我想捕获方括号内的所有数字。数字之间用逗号分隔。例如,我想从文本some text [7, 8], some other texts with 1 or 2 numbers [5]. other texts.
中捕获7,8和5我尝试使用以下模式
pat = (?<=[)[d,s]*(d)[d,s]*(?=])
但似乎对于";[7,8]&;"图案重叠,我只得到"8"。
使用向后看和向前看,恕我直言,过度使用正则表达式。最好是捕捉整个模式,然后去掉第一个和最后一个括号。像这样的代码更容易遵循和理解:
import re
sample = r"""
some text [7,8], some other [2, 3] texts with 1 or 2 numbers [5]. [4,
5] other texts
"""
result = [ s[1:-1] for s in re.findall(r'[d+s*(?:,s*d+)*]', sample) ]
print(result)
如果您真的想使用正则表达式来捕获结果,可以这样做:
result = re.findall(r'[(d+s*(?:,s*d+)*)]', sample)
print(result)
With PyPi regex:
import regex
pat = r'[(?P<numbers>d+)(?:,s*(?P<numbers>d+))*]'
s = r'some text [7, 8], some other texts with 1 or 2 numbers [5]. other texts.'
results = [match.captures('numbers') for match in regex.finditer(pat, s)]
print(results)
参见Python证明。
结果:[['7', '8'], ['5']]
.
表达式解释
--------------------------------------------------------------------------------
[ '['
--------------------------------------------------------------------------------
(?P<numbers> group and capture to "numbers":
--------------------------------------------------------------------------------
d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of k<numbers>
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
s* whitespace (n, r, t, f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?P<numbers> group and capture to "numbers":
--------------------------------------------------------------------------------
d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of k<numbers>
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
] ']'