假设这是我的df1,我想创建df2。
0.67表示Sat中x的百分比,以此类推。
我卡住了如何首先通过grp1组df1,然后在grp1组内再次通过grp2,然后在这个子组内找到n, %的每个观察值。
还要注意,如果在最后一个子组中没有观察值,则将其赋值为0。
我知道我应该在寻求帮助之前提供我的尝试,但是,我真的不知道如何开始处理这种情况。如有任何帮助,不胜感激。
df1 <- data.frame(grp1 = c(rep("A",4),rep("B",3), rep("C",4)),
obs = c("x", "x", "y", "z", "x","y","y", "x", "x","x", "y"),
grp2 = c("Sat", "Sat", "Sat", "Fri", "Sat", "Fri", "Fri", "Sat", "Sat", "Sat", "Fri"))
> df1
grp1 obs grp2
1 A x Sat
2 A x Sat
3 A y Sat
4 A z Fri
5 B x Sat
6 B y Fri
7 B y Fri
8 C x Sat
9 C x Sat
10 C x Sat
11 C y Fri
df2
grp1 obs grp2 n percent
1 A x Sat 2 0.67
2 A y Sat 1 0.33
3 A z Sat 0 0.00
4 A x Fri 0 0.00
5 A y Fri 0 0.00
6 A z Fri 1 1.00
7 B x Sat 1 1.00
8 B y Sat 0 0.00
9 B z Sat 0 0.00
10 B x Fri 0 0.00
11 B y Fri 2 1.00
12 B z Fri 0 0.00
13 C x Sat 3 1.00
14 C y Sat 0 0.00
15 C z Sat 0 0.00
16 C x Fri 0 0.00
17 C y Fri 1 1.00
18 C z Fri 0 0.00
也许这有帮助-获得频率count
across
所有列,扩展行以填补缺失的组合,通过按'grp'列分组后在'n'列上取proportions
来计算'百分比'
library(dplyr)
library(tidyr)
df1 %>%
count(across(everything())) %>%
complete(grp1, obs, grp2, fill = list(n = 0)) %>%
group_by(grp1, grp2) %>%
mutate(percent = proportions(n)) %>%
ungroup
与产出
# A tibble: 18 × 5
grp1 obs grp2 n percent
<chr> <chr> <chr> <int> <dbl>
1 A x Fri 0 0
2 A x Sat 2 0.667
3 A y Fri 0 0
4 A y Sat 1 0.333
5 A z Fri 1 1
6 A z Sat 0 0
7 B x Fri 0 0
8 B x Sat 1 1
9 B y Fri 2 1
10 B y Sat 0 0
11 B z Fri 0 0
12 B z Sat 0 0
13 C x Fri 0 0
14 C x Sat 3 1
15 C y Fri 1 1
16 C y Sat 0 0
17 C z Fri 0 0
18 C z Sat 0 0