假设我有一个字符串:
“Region/Country/Industry/Product”
我只想提取第 n 和第 m 个单斜杠之间的字符。是否有使用现有函数的单行代码可用于执行此操作?
例如,如果我想获取以下字符向量中条目的第二个斜杠和第 3 个斜杠之间的字符串:
c(“EMEA/Germany/Automotive/Mercedes”, “APAC/SouthKorea/Technology/Samsung”,
“AMER/US/Wireless/Verizon”)
具有此类函数的输出将是:
c(“Automotive”,”Technology”,”Wireless”).
我们可以使用sub
来捕获最后一个/
之前的单词,在替换中指定被捕获组的反向引用(\1
(
sub(".*[/](\w+)[/]\w+$", "\1", str1)
#[1] "Automotive" "Technology" "Wireless"
或者另一种变体是
sub("^([^/]+[/]){2}([^/]+).*", "\2", str1)
#[1] "Automotive" "Technology" "Wireless"
或者在分隔符/
处拆分字符串并提取单词
sapply(strsplit(str1, "/"), `[`, 3)
#[1] "Automotive" "Technology" "Wireless"
数据
str1 <- c("EMEA/Germany/Automotive/Mercedes",
"APAC/SouthKorea/Technology/Samsung", "AMER/US/Wireless/Verizon")
当然是stringr
解决方案,
library(stringr)
word(x, 3, sep = '/')
#[1] "Automotive" "Technology" "Wireless"
您也可以使用如下所示strsplit
函数并调整位置
x <- c("EMEA/Germany/Automotive/Mercedes", "APAC/SouthKorea/Technology/Samsung", "AMER/US/Wireless/Verizon")
sapply(x, FUN = function(x) {
y <- unlist(strsplit(x, split="/"))
y[3] # This line can be customised depending the position of the word
}
)
# "Automotive" "Technology" "Wireless"
您还可以删除不需要的部分:
strings <- c("EMEA/Germany/Automotive/Mercedes", "APAC/SouthKorea/Technology/Samsung","AMER/US/Wireless/Verizon")
gsub("^([^/]*/){2}|/[^/]*$","",strings)
#[1] "Automotive" "Technology" "Wireless"