将一行添加到包含某些行的平均值的列表的每个数据帧中



我有一个名为my_list的数据帧列表。下面是my_list中的一个数据帧示例。

> print(df1)    
A          B         Names
1   0.8262825   0.734412    Baseline
2   1.0100000   0.734412    Sample1
3   0.8262825   0.734412    Sample2
4   1.0100000   0.734412    Sample3
5   0.8262825   0.734412    Sample4
6   1.0100000   0.734412    Sample5
7   0.8262825   0.734412    Sample6
8   1.0100000   0.734412    Sample7
9   0.8262825   0.734412    Sample8
10  1.0100000   0.734412    Sample9
11  0.8262825   0.734412    Sample10
12  1.0100000   NA          AASHTO

我想为my_list中的每个数据帧添加一个新行,该数据帧包含a列和B列的平均值,但列名中包含"基线"one_answers"AASHTO"的行除外。(所以只有Sample1到Sample10的行的平均值(

最后,我想将Name列设置为my_list中每个数据帧的行名,并从列表中的所有数据帧中删除列名。

my_list中每个数据帧的预期结果将是

A          B         
Baseline    0.8262825   0.734412    
Sample1     1.0100000   0.734412    
Sample2     0.8262825   0.734412    
Sample3     1.0100000   0.734412    
Sample4     0.8262825   0.734412    
Sample5     1.0100000   0.734412    
Sample6     0.8262825   0.734412    
Sample7     1.0100000   0.734412    
Sample8     0.8262825   0.734412    
Sample9     1.0100000   0.734412    
Sample10    0.8262825   0.734412
Mean        0.8156500   0.734412
AASHTO      1.0100000   NA        

我真的很感激你的帮助。

我们可以用lapply循环list,得到列"A"、"B"的colMeans,不包括"名称"为"基线"或"AASHTO"的行,然后用原始数据集获得rbind

lst2 <-  lapply(lst1, function(x) {
means <- colMeans(x[!x$Names %in% c("Baseline", "AASHTO"),
c('A', 'B')], na.rm = TRUE)
d1 <- rbind(x, data.frame(Names = "Mean", as.list(means)))
row.names(d1) <- d1$Names
d1[setdiff(names(d1), "Names")]
})

或使用tidyverse

library(dplyr)
library(purrr)
library(tibble)
map(lst1, ~ .x %>%
add_row(Names = 'Mean', 
A = mean(.$A[!.$Names %in% c("Baseline", "AASHTO")], 
na.rm = TRUE),
B = mean(.$B[!.$Names %in% c("Baseline", "AASHTO")], na.rm = TRUE)) %>%
`row.names<-`(., NULL) %>%
column_to_rownames('Names')) 

数据

lst1 <- list(structure(list(A = c(0.8262825, 1.01, 0.8262825, 1.01, 0.8262825, 
1.01, 0.8262825, 1.01, 0.8262825, 1.01, 0.8262825, 1.01), B = c(0.734412, 
0.734412, 0.734412, 0.734412, 0.734412, 0.734412, 0.734412, 0.734412, 
0.734412, 0.734412, 0.734412, NA), Names = c("Baseline", "Sample1", 
"Sample2", "Sample3", "Sample4", "Sample5", "Sample6", "Sample7", 
"Sample8", "Sample9", "Sample10", "AASHTO")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12")), structure(list(
A = c(0.8262825, 1.01, 0.8262825, 1.01, 0.8262825, 1.01, 
0.8262825, 1.01, 0.8262825, 1.01, 0.8262825, 1.01), B = c(0.734412, 
0.734412, 0.734412, 0.734412, 0.734412, 0.734412, 0.734412, 
0.734412, 0.734412, 0.734412, 0.734412, NA), Names = c("Baseline", 
"Sample1", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6", 
"Sample7", "Sample8", "Sample9", "Sample10", "AASHTO")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12")))

最新更新