所以我最近开始编写一般的代码,在这里已经呆了好几天了。
在一个简单的规模中,我有两个数据帧,实际上我需要做的是这个
(moduletotals$Freq[3] - totals_df$Freq[1])
从频率"0"减去暗橙色模块簇1的总和;2’-脱氧核糖核苷酸生物合成";对于CCD_ 2的每一行。
但我有太多的数据,我需要建立一个循环、函数或类似的东西,函数可以从单个样本中找到有问题的模块和集群的总频率。
> moduletotals$module == totals_df$module &
moduletotals$cluster == totals_df$cluster
打印找到的行的freq
,并将其从问题中的totals_df行的频率中减去
我在这里完全迷失了方向。
模块总数
模块 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
深绿色 | 深灰色 | 深橙色 | 暗红色 | 深绿色 | 灰色 |
您可以在数据帧之间执行left_join
:
library(tidyverse)
module <- data.frame(
stringsAsFactors = FALSE,
module = c(
"darkgreen",
"darkgrey",
"darkorange",
"darkred",
"darkturquoise",
"grey"
),
cluster = c(1L, 1L, 1L, 1L, 1L, 1L),
freq = c(12L, 408L, 355L, 11L, 12L, 22L)
)
totals <- data.frame(
stringsAsFactors = FALSE,
class_description = c(
"Adaptions and atypical conditions",
"Adaptions and atypical conditions",
"Aerobic",
"Aerobic",
"Aerobic",
"Aerobic",
"Aerobic"
),
module = c(
"darkorange",
"darkgrey",
"darkgrey",
"darkorange",
"grey60",
"lightyellow",
"royalblue"
),
cluster = c(1L, 2L, 1L, 1L, 1L, 1L, 1L),
freq = c(1L, 1L, 4L, 3L, 2L, 3L, 1L)
)
totals %>%
left_join(module,
by = c("module", "cluster")) %>%
replace_na(list(freq.y = 0))
#> class_description module cluster freq.x freq.y
#> 1 Adaptions and atypical conditions darkorange 1 1 355
#> 2 Adaptions and atypical conditions darkgrey 2 1 0
#> 3 Aerobic darkgrey 1 4 408
#> 4 Aerobic darkorange 1 3 355
#> 5 Aerobic grey60 1 2 0
#> 6 Aerobic lightyellow 1 3 0
#> 7 Aerobic royalblue 1 1 0
module %>%
left_join(
totals %>%
group_by(module, cluster) %>%
summarise(freq = sum(freq),
.groups = "drop"),
by = c("module", "cluster")
) %>%
replace_na(list(freq.y = 0))
#> module cluster freq.x freq.y
#> 1 darkgreen 1 12 0
#> 2 darkgrey 1 408 4
#> 3 darkorange 1 355 4
#> 4 darkred 1 11 0
#> 5 darkturquoise 1 12 0
#> 6 grey 1 22 0