我正试图将地址解析成组,我有这个正则表达式:
(^.*?(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),(?:)? ?(.*?),? ?([A-Z]{2,3}),? ?(d{,4})$
,它捕获并分组这些地址:
139 McKinnon Road, PINELANDS, NT, 829
108 East Point Road, Fannie Bay, NT, 820
3-11 Hamilton Street, Townsville City, QLD, 4810
40 17 Geranium Street, THE GARDENS, NT, 820
Lot 9 Island Point Road, ST GEORGES BASIN, NSW, 2540
316 Sturt Street and 511 Flinders Street, Townsville City, QLD, 4810
但不捕获以下格式的地址:
1, 3, 5 Demeter Street & 12 Hermes Avenue ROUSE HILL NSW 2155
31 Stephen Street SOUTH TOOWOOMBA QLD 4350
我想把这些地址分成不同的组,如:
street_address = 1, 3, 5 Demeter Street & 12 Hermes Avenue
subrub = ROUSE HILL
state = QLD
postcode = 4350
如何使用上述表达式捕获这两个地址?这是我的Regex代码
您可以使用以下特定的正则表达式分别匹配您的四个组:
-
组1,包含名为
<street_address>
: 的地址。
.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)
- 组2,包含称为
<subrub>
的郊区:
[A-Za-z ]+
组- 3,包含状态,称为
<state>
:
[A-Z]+
- 集团4,包含邮政编码,称为:
d+
Yourfinal regex是使用一个可选的逗号和一个强制的空格,?
将这些正则表达式连接起来。
(?P<street_address>.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),? (?P<subrub>[A-Za-z ]+),? (?P<state>[A-Z]+),? (?P<postcode>d+)
点击这里查看演示。
注意:在您的Python代码中,您将能够根据其相应的名称提取每个组。