r语言 - 从下划线分隔的字符串中提取第n个元素



我想提取myproductamyproductb

我认为通过regex是可以的,但只适用于:cc字符串,但不适用aa。Howcome吗?长度相同

aa <- "e220juju_uk_yy_aon_aon_conversion_mystore_facebook-network_ppl_primaria_myproducta_galaxycombos_20220520"
cc <- "e220tyty_bo_oo_aon_aon_conversion_mystore_facebook-network_ppl_lal_myproductb_wd95m4473mw_diasdecyber_20220718"

正则表达式部分:

gsub(cc, pattern = ".*_.*_.*_.*_.*_.*_.*_.*_.*_(.*)_.*_.*_.*", replacement = "\1", perl = TRUE) #works: returns: myproductb
gsub(aa, pattern = ".*_.*_.*_.*_.*_.*_.*_.*_.*_(.*)_.*_.*_.*", replacement = "\1", perl = TRUE) #don't work: returns: primaria

您可以使用锚点和一个反字符类,然后在捕获第11次出现之前重复匹配下划线10次。

^(?:[^_]*_){10}([^_]*).*$

Regex demo | R demo

aa <- "e220juju_uk_yy_aon_aon_conversion_mystore_facebook-network_ppl_primaria_myproducta_galaxycombos_20220520"
cc <- "e220tyty_bo_oo_aon_aon_conversion_mystore_facebook-network_ppl_lal_myproductb_wd95m4473mw_diasdecyber_20220718"
pattern <- "^(?:[^_]*_){10}([^_]*).*$"
gsub(pattern, "\1", aa, perl = TRUE)
gsub(pattern, "\1", cc, perl = TRUE)

输出:

[1] "myproducta"
[1] "myproductb"

以下是一些方法

read.table(text = aa, sep = "_")[[11]]
## [1] "myproducta"
strsplit(aa, "_")[[1]][11]
## [1] "myproducta"
scan(text = aa, sep = "_", what = "", quiet = TRUE)[11]
## [1] "myproducta"
sub("^(([^_]*)_){10}([^_]*)_.*", "\3", aa)
## [1] "myproducta"

最新更新