r语言 - 以 dplyr 为单位的相对加权频率/比例



我正在尝试获得一种dplyr方法来制作相对表格或数据加权的比例。 我正在阅读有关dplyr中相对频率/比例的线程,其中一个答案在data.table

library("data.table")
cars_dt <- as.data.table(mtcars)
cars_dt[, .(n = .N), keyby = .(am, gear)][, freq := prop.table(n) , by = "am"]

我了解如何通过将sum(weight)替换为.N来权衡这种方法。 我看不出如何将权重放入其他线程中介绍的 dplyr 方法中。

一位同事给了我答案

library(survey)
# load data
data(api)
x <- apistrat
## data.tab
x <- data.table::data.table(x)
#unweighted proportion of share of schools meeting target by being year round
x[ , .(p = .N) , keyby = .(comp.imp, yr.rnd)][ , .(comp.imp, per = p/sum(p)) , by = yr.rnd ]
yr.rnd comp.imp       per
1:     No       No 0.4413408
2:     No      Yes 0.5586592
3:    Yes       No 0.2380952
4:    Yes      Yes 0.7619048
# weighted (weight is pw)
x[ , .(p = sum(pw)) , keyby = .(comp.imp, yr.rnd)][ , .(comp.imp, per = p/sum(p)) , by = yr.rnd ]
yr.rnd comp.imp       per
1:     No       No 0.3677785
2:     No      Yes 0.6322215
3:    Yes       No 0.1973814
4:    Yes      Yes 0.8026186
## dplyr
x %>% group_by(yr.rnd) %>% count(comp.imp , wt = pw) %>% mutate(per = n/sum(n))
# A tibble: 4 x 4
# Groups:   yr.rnd [2]
yr.rnd comp.imp     n   per
<fct>  <fct>    <dbl> <dbl>
1 No     No       1965. 0.368
2 No     Yes      3378. 0.632
3 Yes    No        168. 0.197
4 Yes    Yes       684. 0.803

最新更新