我有以下向量:
column_names <- c("6Li", "7Li", "10B", "11B", "7Li.1",
"205Pb", "206Pb", "207Pb", "238U",
"206Pb.1", "238U.1")
请注意,有些值只是带" 1"卡在最后了。我想索引出所有这些字符串以及它们对应的字符串,以便只返回以下内容。
#[1] "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"
假设您不知道索引位置,因此您不能简单地将这些值索引出来,如下column_names[c(2,5,7,9,10,11)]
。我如何使用模式匹配来提取这些值?
可能有一个更优雅的解决方案,但在基数R中,您可以尝试grep
/gsub
和paste
的组合:
idx <- grep(paste(gsub("\.1", "", column_names[grep("\.1", column_names)]), collapse = "|"), column_names)
# [1] 2 5 7 9 10 11
column_names[idx]
# [1] "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"
使用gsub()
和duplicated()
查找具有重复词干的值:
column_stems <- gsub("\.1", "", column_names)
dup_idx <- duplicated(column_stems) | duplicated(column_stems, fromLast = TRUE)
column_names[dup_idx]
# "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"
要查找以.2
,.3
等结尾的实例,请在gsub()
中使用"\.\d+"
而不是"\.1"
。
您可以使用stringr
:
library(stringr)
idx <- str_extract(column_names, ".*(?=\.1)")
column_names[str_detect(column_names, paste(idx[!is.na(idx)], collapse = "|"))]
这返回
#> [1] "7Li" "7Li.1" "206Pb" "238U" "206Pb.1" "238U.1"