R:如果A组中的所有值都大于B组中的全部值，则保留该行

我是R中循环的新手。
对照和测试样本的mRNA转录物有一个计数表，一式三份。

gene1 <- c(100, 200, 300, 400, 500, 600)
gene2 <- c(600, 500, 400, 300, 200, 100)
gene3 <- c(100, 200, 400, 300, 500, 600)
data <- rbind(gene1, gene2, gene3)
colnames(data) <- c("control1", "control2","control3","test1","test2","test3")
data <- as.data.frame(data)

如果所有3个"对照"样本的数量都大于3个"测试"样本，我想仔细检查并保留基因(行(
然后应该用这些行创建一个新的列表
(实际数据集中有10000多行。(

我试过下面的代码和all((函数，而不是min((/max((，但它不起作用。

control <- data[, c(1,2,3)]
test <- data[, c(4,5,6)]
for (i in 1:nrow(data)){
if(min(control)>max(test)){
list <- rbind(i, list)
}}

谢谢！

这里有一个基本的R方法。所使用的函数大多是不言自明的，如果你不理解，请在下面留言。

data[sapply(1:nrow(data), function(x)
data[x, which.min(data[x, 1:3])] > data[x, which.max(data[x, 4:6]) + 3])
, ]

输出

control1 control2 control3 test1 test2 test3
gene2      600      500      400   300   200   100

这里有两个选项，一个是基R，另一个是用tidyr和dplyr进行更详细的整形。两者都允许您在不进行任何硬编码的情况下工作，而是使用regex来分隔列。

对于第一个，您当然可以在apply调用中执行grep位；我把它分开是为了更清楚。

library(dplyr)
control_cols <- grep("control", names(data))
test_cols <- grep("test", names(data))
data[apply(data[control_cols], 1, min) > apply(data[test_cols], 1, max), ]
#>       control1 control2 control3 test1 test2 test3
#> gene2      600      500      400   300   200   100

更详细，但可能更灵活(例如，如果你有更多的类型，而不仅仅是控制和测试，或者如果你有一些其他的比较集(是重塑为具有所有控制值的列；一列所有测试值，按基因进行比较，然后重新整形。

data %>%
tibble::rownames_to_column("gene") %>%
tidyr::pivot_longer(-gene, names_to = c(".value", "num"), 
names_pattern = "(^[a-z]+)(\d+$)") %>%
group_by(gene) %>%
filter(min(control) > max(test)) %>%
tidyr::pivot_wider(names_from = num, values_from = c(control, test), 
names_sep = "")
#> # A tibble: 1 × 7
#> # Groups:   gene [1]
#>   gene  control1 control2 control3 test1 test2 test3
#>   <chr>    <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl>
#> 1 gene2      600      500      400   300   200   100

输出

相关内容

最新更新

热门标签：