幸运的是,按组计算mean
和sd
的第一步已经完成。现在我分别得到了mean
和sd
的结果。我想做的是如何将主题结合在一起。无论组合方法是简单还是困难,组合数据框架应该是简单还是不复杂。
这里我将向您展示我的计算方法和我所知道的唯一组合方法。我需要另一种新的组合方法。请。我的示例数据和代码如下:
data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))
unique(data$Name)
## 1 group_by, only get one result each time
mm <- data %>%
group_by(data$Name) %>%
summarise(mean=mean(Gene_1))
mm
## 2 tapply, can get the mean of each column , but only one column each time.
mm <- data.frame(mean_Gene_1=tapply(data[,"Gene_1"],data$Name,mean))
mm
## 3.aggregate, a powerful function , can get all the columns result by group.
mm <- aggregate(.~Name,data,mean)
mm
## get the mean and sd dataframe.
mean <- aggregate(.~Name,data,mean)
sd <- aggregate(.~Name,data,sd)
## now combine the two dataframe usingt the same index "Name" and "gene"
## I just learned one method from somebody in Stack overflow.
## combine the two file
data <- bind_rows(list(mean = mean, sd = sd), .id = "stat")
data_mean_sd <- data %>%
pivot_longer(-c(Name, stat), names_to = "Gene", values_to = "value") %>%
pivot_wider(names_from = "stat", values_from = "value")
你知道结果是正确的。这是一个很大的文件,只是一个例子。它包含许多重复的数据。我希望有人给我一个更好的方法来简化我的结果。
我需要你的帮助。我不确定,下面的方法对你有用吗?最后一部分使用pivot_longer
和pivot_wider
基本相同,但对于总结部分,我使用dplyr::across
。
library(dplyr)
library(tidyr)
data<-data.frame(matrix(sample(1:1000,500),20,25))
names(data) <- c(paste0("Gene_", 1:25))
rownames(data)<-NULL
data$Name<-c(rep(paste0("Group_",1:10),each=2))
data %>%
group_by(Name) %>%
summarise(across(everything(),
list(mean = ~ mean(.x),
sd = ~ sd(.x)),
.names = "{col}__{fn}")) %>%
pivot_longer(-c(Name), names_to = "Gene", values_to = "value") %>%
separate(., Gene, into = c("Gene", "Stats"), sep = "__") %>%
pivot_wider(names_from = Stats, values_from = "value")
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 250 x 4
#> Name Gene mean sd
#> <chr> <chr> <dbl> <dbl>
#> 1 Group_1 Gene_1 534. 556.
#> 2 Group_1 Gene_2 294. 51.6
#> 3 Group_1 Gene_3 262. 350.
#> 4 Group_1 Gene_4 615 338.
#> 5 Group_1 Gene_5 89 43.8
#> 6 Group_1 Gene_6 322 263.
#> 7 Group_1 Gene_7 696. 391.
#> 8 Group_1 Gene_8 182. 101.
#> 9 Group_1 Gene_9 582 139.
#> 10 Group_1 Gene_10 184 2.83
#> # ... with 240 more rows
由reprex包(v0.3.0)在2021-01-27创建