r-从数据帧列表中提取行,并将文件名与符合条件的行绑定



对于列表的输入,当列p.value<0.005,并输出包含作为列1的文件名和提取的行的数据帧。

输入:文件列表:日期框a、B、C等

A.
col1, col2, col3, p.value
X     X      X      0.05
X     X      X      0.001
B.
col1, col2, col3, p.value
X     X      X      0.03
X     X      X      0.01
C. 
col1, col2, col3, p.value
X     X      X      0.1
X     X      X      0.0005
output.
Name, col1, col2, col3, p.value
A      X     X     X     0.001
C      X     X     X     0.0005
files = list.files(".", pattern="\.assoc$")
data1=lapply(files, read.table, header=FALSE, sep=",")
data2 <- lapply(data1, function(x) {i <- which(x$p.value<0.005)
if (length(i) > 0) x[i, ] else NA })
for (i in 1:length(data2)){
data2[[i]]<-cbind(data2[[i]],files[i])}
data_rbind <- do.call("rbind", data2) 
colnames(data_rbind)[c(1:5)]<-c("Name", "Col1", "Col2", "Col3", "p.value")

问题出现在下面的行中,当列表的长度不应该是时,它们都是NA

data2 <- lapply(data1, function(x) {i <- which(x$p.value<0.005)
if (length(i) > 0) x[i, ] else NA })

我们在命名的listlapply上循环,subset基于"p.value"列上的条件的行,Filter从0行的list元素中循环,然后从Maprbind中过滤数据的names创建"Name"list元素以创建单个数据集

tmp <- Filter(nrow, lapply(data1, subset, subset = p.value < 0.005))
do.call(rbind, unname(Map(cbind,  Name = names(tmp), tmp)))

-输出

#    Name col1 col2 col3 p.value
#2        A    X    X    X  0.0010
#21       C    X    X    X  0.0005

或者使用purrr中的maplistfilter上循环p.value小于0.005的行,指定.id以创建新列"Name"。当列表被命名时,它会在"名称"中拾取该名称。_dfr将数据集行绑定到单个数据帧

library(dplyr)
library(purrr)
map_dfr(data1, ~ .x %>% 
filter(p.value < 0.005), .id = 'column1')

-输出

#     Name col1 col2 col3 p.value
#1       A    X    X    X  0.0010
#2       C    X    X    X  0.0005

数据

data1 <- list(A = structure(list(col1 = c("X", "X"), col2 = c("X", "X"
), col3 = c("X", "X"), p.value = c(0.05, 0.001)), class = "data.frame", row.names = c(NA, 
-2L)), B = structure(list(col1 = c("X", "X"), col2 = c("X", "X"
), col3 = c("X", "X"), p.value = c(0.03, 0.01)), class = "data.frame", row.names = c(NA, 
-2L)), C = structure(list(col1 = c("X", "X"), col2 = c("X", "X"
), col3 = c("X", "X"), p.value = c(0.1, 5e-04)), class = "data.frame", row.names = c(NA, 
-2L)))

最新更新