这是一个数据示例:
exp_data <- structure(list(Seq = c("AAAARVDS", "AAAARVDSSSAL",
"AAAARVDSRASDQ"), Change = structure(c(19L, 20L, 13L), .Label = c("",
"C[+58]", "C[+58], F[+1152]", "C[+58], F[+1152], L[+12], M[+12]",
"C[+58], L[+2909]", "L[+12]", "L[+370]", "L[+504]", "M[+12]",
"M[+1283]", "M[+1457]", "M[+1491]", "M[+16]", "M[+16], Y[+1013]",
"M[+16], Y[+1152]", "M[+16], Y[+762]", "M[+371]", "M[+386], Y[+12]",
"M[+486], W[+12]", "Y[+12]", "Y[+1240]", "Y[+1502]", "Y[+1988]",
"Y[+2918]"), class = "factor"), `Mass` = c(1869.943,
1048.459, 707.346), Size = structure(c(2L, 2L, 2L), .Label = c("Matt",
"Greg",
"Kieran"
), class = "factor"), `Number` = c(2L, 2L, 2L)), row.names = c(244L,
392L, 396L), class = "data.frame")
我想提请您注意列名Change
,因为这是我想用于筛选的列名。我们这里有三行,我只想保留第一行,因为特定字母的变化大于100。我想保留所有包含字母变化大于+100的行。这可能是一种情况,在更改列中最多有4-5个字母,但如果至少有一个修改为+100,我想保留这一行。
你有什么简单的解决方案吗?
预期输出:
Seq Change Mass Size Number
244 AAAARVDS M[+486], W[+12] 1869.943 Greg 2
不完全确定我是否正确理解了您的问题陈述,但可能是类似于的内容
library(dplyr)
library(stringr)
exp_data %>% filter(str_detect(Change, "\d{3}"))
# Seq Change Mass Size Number
#1 AAAARVDS M[+486], W[+12] 1869.943 Greg 2
或与基本R 相同
exp_data[grep("\d{3}", exp_data$Change), ]
# Seq Change Mass Size Number
#1 AAAARVDS M[+486], W[+12] 1869.943 Greg 2
其思想是使用正则表达式只保留Change
至少包含一个三位数表达式的那些行。
您可以从stringr
包中使用str_extract_all
library(stringr)
数据表解决方案
library(data.table)
setDT(exp_data)
exp_data[, max := max(as.numeric(str_extract_all(Change, "[[:digit:]]+")[[1]])), by = Seq]
exp_data[max > 100, ]
Seq Change Mass Size Number max
1: AAAARVDS M[+486], W[+12] 1869.9 Greg 2 486
dplyr溶液
library(dplyr)
exp_data %>%
group_by(Seq) %>%
filter(max(as.numeric(str_extract_all(Change, "[[:digit:]]+")[[1]])) > 100)
# A tibble: 1 x 5
# Groups: Seq [1]
Seq Change Mass Size Number
<chr> <fct> <dbl> <fct> <int>
1 AAAARVDS M[+486], W[+12] 1870. Greg 2