我正在处理一个多行字符串,试图在字符串中捕获有效的逗号分隔数字。
例如:
my_string = """42 <---capture 42 in this line
1,234 <---capture 1,234 in this line
3,456,780 <---capture 3,456,780 in this line
34,56,780 <---don't capture anything in this line but 34 and 56,780 captured
1234 <---don't capture anything in this line but 123 and 4 captured
"""
理想情况下,我希望re.findall返回:
['42', '1,234', '3,456,780']
这是我的代码:
a = """
42
1,234
3,456,780
34,56,780
1234
"""
regex = re.compile(r'd{1,3}(?:,d{3})*')
print(regex.findall(a))
我上面代码的结果是:
['42', '1,234', '3,456,780', '34', '56,780', '123', '4']
但我想要的输出应该是:
['42', '1,234', '3,456,780']
如果只想捕获与模式匹配的整行,则需要用^
和$
锚定正则表达式,并使用re.MULTILINE
标志使它们匹配行的开头/结尾,而不仅仅是字符串的开头/末尾。
regex = re.compile(r'^d{1,3}(?:,d{3})*$', re.MULTILINE)
使用查找来确保数字前后没有数字或逗号:
import re
a = """
42
1,234
3,456,780
34,56,780
1234
"""
regex = re.compile(r'(?<![d,])d{1,3}(?:,d{3})*(?![d,])')
print(regex.findall(a))
输出:
['42', '1,234', '3,456,780']