r-选择名称开头和结尾具有固定模式、中间部分可变的列

我想要select数据帧列，这些列的名称的开头和结尾都有特定的模式，在中间有几个可能值中的一个。这是有效的，但我发现intersect的双重使用不是很优雅。

df <- data.frame(var1_one_num = sample(1:10, 10, replace = TRUE),
var1_two_num = sample(1:10, 10, replace = TRUE),
var1_three_num = sample(1:10, 10, replace = TRUE),
var1_four_num = sample(1:10, 10, replace = TRUE),
var2_one_num = sample(1:10, 10, replace = TRUE),
var1_one_fac = sample(1:10, 10, replace = TRUE))
var_middle <- c("one|two|three")
df %>% select(intersect(starts_with("var1_"),
intersect(matches(var_middle),
ends_with("_num")))) %>% names()
[1] "var1_one_num"   "var1_two_num"   "var1_three_num"

我怀疑any of或类似产品有更聪明的方法，但我无法绕过它。

看起来您只需要列名-您可以使用正则表达式来实现这一点：

> grep(pattern = '^var1.*(one|two|three).*num$', x = colnames(df), value = T)
[1] "var1_one_num"   "var1_two_num"   "var1_three_num"

^符号表示字符串必须以该模式开始，$表示字符串必须用什么结束。带有|分隔符的圆括号表示这些值中的任何一个都是可接受的。

获取列值：

> df[, grep(pattern = '^var1.*(one|two|three).*num$', x = colnames(df), value = T)]
var1_one_num var1_two_num var1_three_num
1             9            1              7
2             2           10              4
3             2            9              1
4             1            5              4
5             4            9             10
6             6            8              8
7             9            5              7
8             6            2              6
9             5            3              5
10            1            1              7

如果您不熟悉regex，这里有一个很好的链接来了解更多信息：https://cran.r-project.org/web/packages/stringr/vignettes/regular-expressions.html

希望这有帮助！

这是@tmfmnk的回答，我建议他发布，但到目前为止他还没有。既然我想在dplyr中做点什么，这就是我想要的：

df %>% select(matches("^var1_(one|two|three)_.*num$"))

相关内容

最新更新

热门标签：