我有一个像这样的数据帧:
df <- data.frame(group1=c("A","A","A","A","B","B","B","B"),
group2 = c("X", "X", "Y", "Y","X", "X", "Y", "Y"),
type = c("a", "b", "c","d","e","f","g","h"),
count = c(1,2,3,4,5,6,7,8))
> df
group1 group2 type count
1 A X a 1
2 A X b 2
3 A Y c 3
4 A Y d 4
5 B X e 5
6 B X f 6
7 B Y g 7
8 B Y h 8
我想把每种类型的相对频率,按group1
和group2
分组,成一列。我有一个方法,但是很笨拙,需要总结,然后加入。我觉得一定有办法在一个dplyr
语句中做到这一点。但我不明白的是,在总结之后,我需要回到所有的行。
我的笨拙方式:
df.summ <- df %>% group_by(group1, group2) %>%
summarize(tot = sum(count))
df <- left_join(df, df.summ)
df <- df %>% mutate(freq = count/tot)
> df
group1 group2 type count tot freq
1 A X a 1 3 0.3333333
2 A X b 2 3 0.6666667
3 A Y c 3 7 0.4285714
4 A Y d 4 7 0.5714286
5 B X e 5 11 0.4545455
6 B X f 6 11 0.5454545
7 B Y g 7 15 0.4666667
8 B Y h 8 15 0.5333333
如果我们使用mutate
而不是summarise/left_join
作为summarise
默认情况下每组只返回一行,而mutate
在原始数据集中创建一个新列,则更直接
library(dplyr)
df1 <- df %>%
group_by(group1, group2) %>%
mutate(freq = count/sum(count)) %>%
ungroup
base R一行代码:
df$freq <- with(df, ave(count, list(group1, group2), FUN = function(x) x/sum(x)))
df
# group1 group2 type count freq
#1 A X a 1 0.3333333
#2 A X b 2 0.6666667
#3 A Y c 3 0.4285714
#4 A Y d 4 0.5714286
#5 B X e 5 0.4545455
#6 B X f 6 0.5454545
#7 B Y g 7 0.4666667
#8 B Y h 8 0.5333333