r语言 - 提取特殊字符之间的字符串子集



假设我有一个字符串:

“Region/Country/Industry/Product”

我只想提取第 n 和第 m 个单斜杠之间的字符。是否有使用现有函数的单行代码可用于执行此操作?

例如,如果我想获取以下字符向量中条目的第二个斜杠和第 3 个斜杠之间的字符串:

c(“EMEA/Germany/Automotive/Mercedes”, “APAC/SouthKorea/Technology/Samsung”, 
“AMER/US/Wireless/Verizon”)

具有此类函数的输出将是:

c(“Automotive”,”Technology”,”Wireless”).

我们可以使用sub来捕获最后一个/之前的单词,在替换中指定被捕获组的反向引用(\1(

sub(".*[/](\w+)[/]\w+$", "\1", str1)
#[1] "Automotive" "Technology" "Wireless"  

或者另一种变体是

sub("^([^/]+[/]){2}([^/]+).*", "\2", str1)
#[1] "Automotive" "Technology" "Wireless"  

或者在分隔符/处拆分字符串并提取单词

sapply(strsplit(str1, "/"), `[`, 3)
#[1] "Automotive" "Technology" "Wireless"  

数据

str1 <-  c("EMEA/Germany/Automotive/Mercedes", 
"APAC/SouthKorea/Technology/Samsung", "AMER/US/Wireless/Verizon")

当然是stringr解决方案,

library(stringr)
word(x, 3, sep = '/')
#[1] "Automotive" "Technology" "Wireless"

您也可以使用如下所示strsplit函数并调整位置

x <- c("EMEA/Germany/Automotive/Mercedes", "APAC/SouthKorea/Technology/Samsung", "AMER/US/Wireless/Verizon")
sapply(x, FUN = function(x) {
y <- unlist(strsplit(x, split="/"))
y[3] # This line can be customised depending the position of the word
}
)
# "Automotive"                       "Technology"                         "Wireless" 

您还可以删除不需要的部分:

strings <- c("EMEA/Germany/Automotive/Mercedes", "APAC/SouthKorea/Technology/Samsung","AMER/US/Wireless/Verizon")
gsub("^([^/]*/){2}|/[^/]*$","",strings)
#[1] "Automotive" "Technology" "Wireless" 

最新更新