我正在尝试获得一种dplyr方法来制作相对表格或数据加权的比例。 我正在阅读有关dplyr中相对频率/比例的线程,其中一个答案在data.table
library("data.table")
cars_dt <- as.data.table(mtcars)
cars_dt[, .(n = .N), keyby = .(am, gear)][, freq := prop.table(n) , by = "am"]
我了解如何通过将sum(weight)
替换为.N
来权衡这种方法。 我看不出如何将权重放入其他线程中介绍的 dplyr 方法中。
一位同事给了我答案
library(survey)
# load data
data(api)
x <- apistrat
## data.tab
x <- data.table::data.table(x)
#unweighted proportion of share of schools meeting target by being year round
x[ , .(p = .N) , keyby = .(comp.imp, yr.rnd)][ , .(comp.imp, per = p/sum(p)) , by = yr.rnd ]
yr.rnd comp.imp per
1: No No 0.4413408
2: No Yes 0.5586592
3: Yes No 0.2380952
4: Yes Yes 0.7619048
# weighted (weight is pw)
x[ , .(p = sum(pw)) , keyby = .(comp.imp, yr.rnd)][ , .(comp.imp, per = p/sum(p)) , by = yr.rnd ]
yr.rnd comp.imp per
1: No No 0.3677785
2: No Yes 0.6322215
3: Yes No 0.1973814
4: Yes Yes 0.8026186
## dplyr
x %>% group_by(yr.rnd) %>% count(comp.imp , wt = pw) %>% mutate(per = n/sum(n))
# A tibble: 4 x 4
# Groups: yr.rnd [2]
yr.rnd comp.imp n per
<fct> <fct> <dbl> <dbl>
1 No No 1965. 0.368
2 No Yes 3378. 0.632
3 Yes No 168. 0.197
4 Yes Yes 684. 0.803