如果数字大于..,则只保留行..在特定列中

  • 本文关键字:保留 大于 数字 如果 r
  • 更新时间 :
  • 英文 :


这是一个数据示例:

exp_data <- structure(list(Seq = c("AAAARVDS", "AAAARVDSSSAL", 
"AAAARVDSRASDQ"), Change = structure(c(19L, 20L, 13L), .Label = c("", 
                                    "C[+58]", "C[+58], F[+1152]", "C[+58], F[+1152], L[+12], M[+12]", 
                                    "C[+58], L[+2909]", "L[+12]", "L[+370]", "L[+504]", "M[+12]", 
                                    "M[+1283]", "M[+1457]", "M[+1491]", "M[+16]", "M[+16], Y[+1013]", 
                                    "M[+16], Y[+1152]", "M[+16], Y[+762]", "M[+371]", "M[+386], Y[+12]", 
                                    "M[+486], W[+12]", "Y[+12]", "Y[+1240]", "Y[+1502]", "Y[+1988]", 
                                    "Y[+2918]"), class = "factor"), `Mass` = c(1869.943, 
                                                                                        1048.459, 707.346), Size = structure(c(2L, 2L, 2L), .Label = c("Matt", 
                                                                                                                                                          "Greg", 
                                                                                                                                                          "Kieran"
                                                                                        ), class = "factor"), `Number` = c(2L, 2L, 2L)), row.names = c(244L, 
                                                                                                                                                          392L, 396L), class = "data.frame")

我想提请您注意列名Change,因为这是我想用于筛选的列名。我们这里有三行,我只想保留第一行,因为特定字母的变化大于100。我想保留所有包含字母变化大于+100的行。这可能是一种情况,在更改列中最多有4-5个字母,但如果至少有一个修改为+100,我想保留这一行。

你有什么简单的解决方案吗?

预期输出:

Seq          Change     Mass Size Number
244      AAAARVDS M[+486], W[+12] 1869.943 Greg      2

不完全确定我是否正确理解了您的问题陈述,但可能是类似于的内容

library(dplyr)
library(stringr)
exp_data %>% filter(str_detect(Change, "\d{3}"))
#       Seq          Change     Mass Size Number
#1 AAAARVDS M[+486], W[+12] 1869.943 Greg      2 

或与基本R 相同

exp_data[grep("\d{3}", exp_data$Change), ]
#       Seq          Change     Mass Size Number
#1 AAAARVDS M[+486], W[+12] 1869.943 Greg      2 

其思想是使用正则表达式只保留Change至少包含一个三位数表达式的那些行。

您可以从stringr包中使用str_extract_all

library(stringr)

数据表解决方案

library(data.table)
setDT(exp_data)
exp_data[, max := max(as.numeric(str_extract_all(Change, "[[:digit:]]+")[[1]])), by = Seq]
exp_data[max > 100, ]
Seq          Change   Mass Size Number max
1: AAAARVDS M[+486], W[+12] 1869.9 Greg      2 486

dplyr溶液

library(dplyr)
exp_data %>% 
group_by(Seq) %>% 
filter(max(as.numeric(str_extract_all(Change, "[[:digit:]]+")[[1]])) > 100)
# A tibble: 1 x 5
# Groups:   Seq [1]
Seq      Change           Mass Size  Number
<chr>    <fct>           <dbl> <fct>  <int>
1 AAAARVDS M[+486], W[+12] 1870. Greg       2

最新更新