使用正则表达式捕获组解析地址



我正试图将地址解析成组,我有这个正则表达式:

(^.*?(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),(?:)? ?(.*?),? ?([A-Z]{2,3}),? ?(d{,4})$

,它捕获并分组这些地址:

139 McKinnon Road, PINELANDS, NT, 829
108 East Point Road, Fannie Bay, NT, 820
3-11 Hamilton Street, Townsville City, QLD, 4810
40 17 Geranium Street, THE GARDENS, NT, 820
Lot 9 Island Point Road, ST GEORGES BASIN, NSW, 2540
316 Sturt Street and 511 Flinders Street, Townsville City, QLD, 4810

但不捕获以下格式的地址:

1, 3, 5 Demeter Street & 12 Hermes Avenue ROUSE HILL NSW 2155
31 Stephen Street SOUTH TOOWOOMBA QLD 4350

我想把这些地址分成不同的组,如:

street_address = 1, 3, 5 Demeter Street & 12 Hermes Avenue
subrub = ROUSE HILL
state = QLD
postcode = 4350

如何使用上述表达式捕获这两个地址?这是我的Regex代码

您可以使用以下特定的正则表达式分别匹配您的四个组:

  • 组1,包含名为<street_address>:
  • 的地址。
.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)
  • 组2,包含称为<subrub>的郊区:
[A-Za-z ]+
  • 3,包含状态,称为<state>:
[A-Z]+
  • 集团4,包含邮政编码,称为:
d+

Yourfinal regex是使用一个可选的逗号和一个强制的空格,?将这些正则表达式连接起来。

(?P<street_address>.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),? (?P<subrub>[A-Za-z ]+),? (?P<state>[A-Z]+),? (?P<postcode>d+)

点击这里查看演示。

注意:在您的Python代码中,您将能够根据其相应的名称提取每个组。

相关内容

  • 没有找到相关文章

最新更新