R REGEX用于识别英国邮政编码



我的问题与此相似,但是我正在寻找特定于R的东西。我有一个数据。数以万计的地址的框架,需要拔出邮政编码。邮政编码在英国,格式化{letter_letter_digit letter_letter_digit}。类似于以下内容:

" 8,longbow关闭, r nharlescott巷, r nshrewsbury, r nengland, r nsy1 3gz"

我使用了与stringr的代码的变体无济于事:

str_extract('^(\[Gg]\[Ii]\[Rr] 0\[Aa]{2})|(((\[A-Za-z]\[0-9]{1,2})|((\ 
[A-Za-z]\[A-Ha-hJ-Yj-y]\[0-9]{1,2})|((\[AZa-z]\[0-9]\[A-Za-z])|(\[A-Za- 
z]\[A-Ha-hJ-Yj-y]\[0-9]?\[A-Za-z]))))\[0-9]\[A-Za-z]{2})$',alfa$Address) 

^$锚要求该模式与整个字符串匹配。您可以用b(?:<pattern>)b包裹该图案,以将这些代码匹配为整个单词(b是单词边界(。此外,由于您逃脱了他们的[启动括号,因此角色类"毁了"([匹配字面的[ chars(。另外,交换参数,第一个是输入,第二个是正则是正则。另外,要获得所有匹配项,您需要使用str_extract_all而不是str_extract

您可以这样修复代码:

library(stringr)
txt <- "8, Longbow Close,rnHarlescott Lane,rnShrewsbury,rnEngland,rnSY1 3GZ"
pattern <- "\b(?:([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))\s?[0-9][A-Za-z]{2}))\b"
str_extract_all(txt, pattern)
# => [[1]]
#   [1] "SY1 3GZ"

这是一种更可读的方法:

            if ($e{locate} =~ /b([A-Z])([A-Z])([0-9])([A-Z]) ([0-9])([A-Z])([A-Z])b/) {
                    $e{zip} = $1.$2.$3.$4.$5.$6.$7;
                    $e{zips} = $1.$2.$3.$4.' ' .$5.$6.$7;
            } elsif ($e{locate} =~ /b([A-Z])([0-9])([A-Z]) ([0-9])([A-Z])([A-Z])b/) {
                    $e{zip} = $1.$2.$3.$4.$5.$6;
                    $e{zips} = $1.$2.$3.' '.$4.$5.$6;
            } elsif ($e{locate} =~ /b([A-Z])([0-9]) ([0-9])([A-Z])([A-Z])b/) {
                    $e{zip} = $1.$2.$3.$4.$5;
                    $e{zips} = $1.$2.' '.$3.$4.$5;
            } elsif ($e{locate} =~ /b([A-Z])([0-9])([0-9]) ([0-9])([A-Z])([A-Z])b/) {
                    $e{zip} = $1.$2.$3.$4.$5.$6;
                    $e{zips} = $1.$2.$3.' '.$4.$5.$6;
            } elsif ($e{locate} =~ /b([A-Z])([A-Z])([0-9]) ([0-9])([A-Z])([A-Z])b/) {
                    $e{zip} = $1.$2.$3.$4.$5.$6;
                    $e{zips} = $1.$2.$3.' ' .$4.$5.$6;
            } elsif ($e{locate} =~ /b([A-Z])([A-Z])([0-9])([0-9]) ([0-9])([A-Z])([A-Z])b/) {
                    $e{zip} = $1.$2.$3.$4.$5.$6.$7;
                    $e{zips} = $1.$2.$3.$4.' '.$5.$6.$7;
            }

最新更新