r-使用dplyr计算两组的出现百分比和频率



我正在学习dplyr,并从类似的帖子中寻找解决方案,但没有发现这种问题组合。

以下是一个示例数据帧:

set.seed(1)
df <- data.frame(sampleID = c(rep("sample1",2),
rep("sample2",3),
rep("sample3",4)),
species = c("clover","nettle",
"clover","nettle","vine",
"clover","clover","nettle","vine"),
type = c("vegetation","seed",
"vegetation","vegetation","vegetation",
"seed","vegetation","seed","vegetation"),
mass = sample(1:9))
> df
sampleID species       type mass
1  sample1  clover vegetation    9
2  sample1  nettle       seed    4
3  sample2  clover vegetation    7
4  sample2  nettle vegetation    1
5  sample2    vine vegetation    2
6  sample3  clover       seed    6
7  sample3  clover vegetation    3
8  sample3  nettle       seed    8
9  sample3    vine vegetation    5

我需要返回一个数据帧,该数据帧计算每个独特物种/类型组合的质量百分比,并且我需要sampleIDs 中物种/类型出现的频率百分比

因此,本例中葡萄藤/植被的种类/类型的解决方案为质量百分比=(5+2(/(总和(质量((并且百分比频率将是2/3,因为该组合没有出现在样本1中。

首先,我尝试了不同的组合,例如:

df %>%
group_by(species,type) %>%
summarize(totmass = sum(mass))  %>%
mutate(percmass = totmass/sum(totmass))

但这就为葡萄树/植被提供了100%的质量?此外,我不知道从哪里可以获得基于sampleID的百分比频率。

不确定我是否答对了,但也许这就是你想要的:

set.seed(1)
df <- data.frame(sampleID = c(rep("sample1",2),
rep("sample2",3),
rep("sample3",4)),
species = c("clover","nettle",
"clover","nettle","vine",
"clover","clover","nettle","vine"),
type = c("vegetation","seed",
"vegetation","vegetation","vegetation",
"seed","vegetation","seed","vegetation"),
mass = sample(1:9))
library(dplyr)
df %>%
# Add total mass
add_count(wt = mass, name = "sum_mass") %>%
# Add total number of samples
add_count(nsamples = n_distinct(sampleID)) %>%
# Add sum_mass and nsamples to group_by
group_by(species, type, sum_mass, nsamples) %>%
summarize(nsample = n_distinct(sampleID), 
totmass = sum(mass), .groups = "drop")  %>%
mutate(percmass = totmass / sum_mass,
percfreq = nsample / nsamples)
#> # A tibble: 5 x 8
#>   species type       sum_mass nsamples nsample totmass percmass percfreq
#>   <chr>   <chr>         <int>    <int>   <int>   <int>    <dbl>    <dbl>
#> 1 clover  seed             45        3       1       6   0.133     0.333
#> 2 clover  vegetation       45        3       3      19   0.422     1    
#> 3 nettle  seed             45        3       2      12   0.267     0.667
#> 4 nettle  vegetation       45        3       1       1   0.0222    0.333
#> 5 vine    vegetation       45        3       2       7   0.156     0.667

最新更新