捕获有效的逗号分隔数字python正则表达式



我正在处理一个多行字符串,试图在字符串中捕获有效的逗号分隔数字。

例如:

my_string = """42     <---capture 42 in this line
1,234    <---capture 1,234 in this line
3,456,780    <---capture 3,456,780 in this line
34,56,780    <---don't capture anything in this line but 34 and 56,780 captured
1234    <---don't capture anything in this line but 123 and 4 captured
"""

理想情况下,我希望re.findall返回:

['42', '1,234', '3,456,780']

这是我的代码:

a = """
42
1,234
3,456,780
34,56,780
1234
"""
regex = re.compile(r'd{1,3}(?:,d{3})*')
print(regex.findall(a))

我上面代码的结果是:

['42', '1,234', '3,456,780', '34', '56,780', '123', '4']

但我想要的输出应该是:

['42', '1,234', '3,456,780']

如果只想捕获与模式匹配的整行,则需要用^$锚定正则表达式,并使用re.MULTILINE标志使它们匹配行的开头/结尾,而不仅仅是字符串的开头/末尾。

regex = re.compile(r'^d{1,3}(?:,d{3})*$', re.MULTILINE)

使用查找来确保数字前后没有数字或逗号:

import re
a = """
42
1,234
3,456,780
34,56,780
1234
"""
regex = re.compile(r'(?<![d,])d{1,3}(?:,d{3})*(?![d,])')
print(regex.findall(a))    

输出:

['42', '1,234', '3,456,780']

最新更新