r语言 - 如何计算FPKM基因分组计数的均值和sd,并将均值和sd合并为数据框?



幸运的是,按组计算meansd的第一步已经完成。现在我分别得到了meansd的结果。我想做的是如何将主题结合在一起。无论组合方法是简单还是困难,组合数据框架应该是简单还是不复杂。

这里我将向您展示我的计算方法和我所知道的唯一组合方法。我需要另一种新的组合方法。请。我的示例数据和代码如下:

data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))

unique(data$Name)
## 1 group_by, only get one result each time
mm <- data %>% 
group_by(data$Name) %>% 
summarise(mean=mean(Gene_1))
mm
## 2 tapply, can get the mean of each column , but only one column each time.
mm <- data.frame(mean_Gene_1=tapply(data[,"Gene_1"],data$Name,mean))  
mm
## 3.aggregate, a powerful function , can get all the columns result by group.
mm <- aggregate(.~Name,data,mean) 
mm

## get the mean and sd dataframe.
mean <- aggregate(.~Name,data,mean) 
sd <- aggregate(.~Name,data,sd) 

## now combine the two dataframe usingt the same index "Name" and "gene"        
## I just learned one method from somebody in Stack overflow. 
## combine the two file 
data <- bind_rows(list(mean = mean, sd = sd), .id = "stat")

data_mean_sd <- data %>% 
pivot_longer(-c(Name, stat), names_to = "Gene", values_to = "value") %>%
pivot_wider(names_from = "stat", values_from = "value")

你知道结果是正确的。这是一个很大的文件,只是一个例子。它包含许多重复的数据。我希望有人给我一个更好的方法来简化我的结果。

我需要你的帮助。

我不确定,下面的方法对你有用吗?最后一部分使用pivot_longerpivot_wider基本相同,但对于总结部分,我使用dplyr::across

library(dplyr)
library(tidyr)
data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))

data %>% 
group_by(Name) %>% 
summarise(across(everything(),
list(mean = ~ mean(.x),
sd = ~ sd(.x)),
.names = "{col}__{fn}")) %>% 
pivot_longer(-c(Name), names_to = "Gene", values_to = "value") %>% 
separate(., Gene, into = c("Gene", "Stats"), sep = "__") %>% 
pivot_wider(names_from = Stats, values_from = "value")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 250 x 4
#>    Name    Gene     mean     sd
#>    <chr>   <chr>   <dbl>  <dbl>
#>  1 Group_1 Gene_1   534. 556.  
#>  2 Group_1 Gene_2   294.  51.6 
#>  3 Group_1 Gene_3   262. 350.  
#>  4 Group_1 Gene_4   615  338.  
#>  5 Group_1 Gene_5    89   43.8 
#>  6 Group_1 Gene_6   322  263.  
#>  7 Group_1 Gene_7   696. 391.  
#>  8 Group_1 Gene_8   182. 101.  
#>  9 Group_1 Gene_9   582  139.  
#> 10 Group_1 Gene_10  184    2.83
#> # ... with 240 more rows

由reprex包(v0.3.0)在2021-01-27创建