如何将多个正则表达式值作为元组返回

我正在开发一个Python程序，该程序可以搜索收到的电子邮件并返回坐标。我正在尝试创建一个正则表达式来从字符串中选择Lat/long值。(我是regex新手(

下面是一个我一直在测试的字符串的小例子：

content = """
WorkLocationBoundingBox
Latitude:30.556555Longitude:-97.659824
SecondLatitude:30.569138SecondLongitude:-97.650855
"""

我提出了Latitude:(d+).(d+)Longitude:(.*)，我相信它接近我的需求，但它将30和556555分为不同的组。但是，-97.659824被正确地放入一个组中。

我理想的预期结果是这样的：

[(30.556555, -97.659824, 30.569138, -97.650855)]

您可以使用3个捕获组，其中第一个组用于匹配Long或Latitude之前的单词。

((?:Second)?)Latitude:(-?d+(?:.d+)?)1Longitude:(-?d+(?:.d+)?)

((?:Second)?)捕获组1，可选匹配Second
Latitude:按字面匹配
(-?d+(?:.d+)?)捕获组2，匹配一个可选的-，然后用一个可选小数部分匹配1+位数字
1Longitude:对组1中匹配内容和匹配Longitude:的反向参考
(-?d+(?:.d+)?)捕获组3，匹配一个可选的-，然后匹配一个带可选小数部分的1+位数字

Regex演示或Python演示

import re
regex = r"((?:Second)?)Latitude:(-?d+(?:.d+)?)1Longitude:(-?d+(?:.d+)?)"
s = ("WorkLocationBoundingBoxn"
"Latitude:30.556555Longitude:-97.659824n"
"SecondLatitude:30.569138SecondLongitude:-97.650855")
matches = re.finditer(regex, s)
lst = []
for matchNum, match in enumerate(matches, start=1):
lst.append(match.group(2))
lst.append(match.group(3))
print(lst)

输出

['30.556555', '-97.659824', '30.569138', '-97.650855']

一个不太严格的模式可能是在经度或纬度之前匹配可选的单词字符：

w*Latitude:(-?d+(?:.d+)?)w*Longitude:(-?d+(?:.d+)?)

Regex演示

在这种情况下，如果需要，您也可以使用re.findall返回元组列表中的组值：

import re
pattern = r"w*Latitude:(-?d+(?:.d+)?)w*Longitude:(-?d+(?:.d+)?)"
s = ("WorkLocationBoundingBoxn"
"Latitude:30.556555Longitude:-97.659824n"
"SecondLatitude:30.569138SecondLongitude:-97.650855")
print(re.findall(pattern, s))

输出

[('30.556555', '-97.659824'), ('30.569138', '-97.650855')]

相关内容

最新更新

热门标签：