r-为坐标选择正确的正则表达式



我有各种格式的坐标,并试图获得或多或少通用的转换例程。

为此,我尝试用regex表达式解析字符串中的各个元素,并尝试通过它们在字符串中的出现索引来获得度、分钟和秒的各个信息。

对一些人来说,它是有效的。。。。但并非所有人都如此。我非常确信,我的问题与我对regex的有限理解密切相关。

因此,问题是:谁对正则表达式模式有更好的理解,并可能提供帮助?

我试图编译一小段代码来演示这个问题。运行下面的示例显示,我得到了前四个和最后三个坐标的三个分量。其余部分(介于两者之间(仅提供2个组件。。。。

coords = c("-53°30''30.54'",
"s55°30' 30.54",
"55°30'30.54n",
"0°1 0.5S",
"-0°30'30''s",
"S55 30 30",
"-55°30'30''",
"-55° 30' 30''",
"-55°   30'   30",
"-55 sometimes with text rests 30 30''",
"55°30'30,54S",
"S55° 30' 30,54",
"-55° 30' 30.54''"
)
for (i in 1:length (coords)) {
pattern   <- gregexpr ("[0-9.]+", coords [i])
print (as.character (unique (unlist (regmatches (coords [i], pattern)))))
}

<Output>
[1] "53"    "30"    "30.54"
[1] "55"    "30"    "30.54"
[1] "55"    "30"    "30.54"
[1] "0"   "1"   "0.5"
[1] "0"  "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30" "54"
[1] "55" "30" "54"
[1] "55"    "30"    "30.54"

下面的正则表达式是一个非常令人印象深刻的怪物;-(然而,当坐标的格式稍有不同(例如dec_deg(时,它会遇到一些问题。在这种情况下,字符串的第一个或第二个数字没有被正确识别。我刚刚用这样的坐标编制了一个列表:

coords=c("-53°30’’30.54’";,"s55°30′30.54〃;,"55°30′30.54n〃;,"0°1 0.5S";,&quot-0°30’30’’s";,"S55 30 30〃;,&quot-55°30’30’’’’";,&quot-55°30’30’’’’;,&quot-55°30′30〃;,&quot-55有时具有文本搁架30 30’’’;,"55°30′30,54S〃;,"S55°30′30,54〃;,&quot-55°30’30.54’’";,&quot-55.5432 30 30.54〃;,&quot-55.30.30〃;,"55.555〃;,"55555S";,"S55555";,"S55.555〃;,"55555°S";,"55.555°";,&quot-55555〃;,&quot-55.555〃;

)

它似乎可以与stringr一起工作。。。

library(stringr)
str_extract_all(str_replace_all(coords, ",", "."), "[0-9.\-]+")
[[1]]
[1] "-53"   "30"    "30.54"
[[2]]
[1] "55"    "30"    "30.54"
[[3]]
[1] "55"    "30"    "30.54"
[[4]]
[1] "0"   "1"   "0.5"
[[5]]
[1] "-0" "30" "30"
[[6]]
[1] "55" "30" "30"
[[7]]
[1] "-55" "30"  "30" 
[[8]]
[1] "-55" "30"  "30" 
[[9]]
[1] "-55" "30"  "30" 
[[10]]
[1] "-55" "30"  "30" 
[[11]]
[1] "55"    "30"    "30.54"
[[12]]
[1] "55"    "30"    "30.54"
[[13]]
[1] "-55"   "30"    "30.54"

我们可以尝试将regexecregmatches一起使用,以在每行中正好匹配三个数字。A";数字";此处定义为整数或带小数成分的整数(小数点为点或逗号(。

我们可以使用do.call将上述向量输出的列表转换为矩阵。

regex <- "^.*?(-?\d+(?:[,.]\d+)?).*?(-?\d+(?:[,.]\d+)?).*?(-?\d+(?:[,.]\d+)?).*$"
do.call(rbind, lapply(regmatches(coords, regexec(regex, coords)), function(x) x[2:4]))
[,1]  [,2] [,3]   
[1,] "-53" "30" "30.54"
[2,] "55"  "30" "30.54"
[3,] "55"  "30" "30.54"
[4,] "0"   "1"  "0.5"  
[5,] "-0"  "30" "30"   
[6,] "55"  "30" "30"   
[7,] "-55" "30" "30"   
[8,] "-55" "30" "30"   
[9,] "-55" "30" "30"   
[10,] "-55" "30" "30"   
[11,] "55"  "30" "30,54"
[12,] "55"  "30" "30,54"
[13,] "-55" "30" "30.54"

最新更新