r-使用Dplyr计算行百分比



好吧,我有一个这样的数据帧:

ID Month Year DOW  Value
1  Jan   2019 Fri  20
1  Jan   2019 Sat  39
1  Feb   2019 Fri  30
1  Feb   2019 Sat  24
2  Jan   2019 Fri  20
2  Jan   2019 Sat  12
2  Feb   2019 Fri   1
2  Feb   2019 Sat   3

我的目标是确定每行相对于月份年份的百分比;

手动计算这个答案应该是:

ID Month Year DOW  Value   Percent
1  Jan   2019 Fri  20      .338
1  Jan   2019 Sat  39      .661
1  Feb   2019 Fri  30      .554
1  Feb   2019 Sat  24      .444
2  Jan   2019 Fri  20      .625
2  Jan   2019 Sat  12      .375
2  Feb   2019 Fri   1      .25
2  Feb   2019 Sat   3      .75

请注意,月份-年份组合总是加1。

最后,我想取ID 1和2 刚刚计算的百分比的平均值

Month Year DOW  Avg
Jan   2019 Fri  0.482
Jan   2019 Sat  0.518
Feb   2019 Fri  0.402
Feb   2019 Sat  0.597

目标是使用dplyr

这行吗:

>library(dplyr)
> df %>% group_by(ID, Month, Year) %>% mutate(Percent = Value/sum(Value)) %>% 
group_by(Month, Year,DOW) %>% summarise(Avg = mean(Percent)) %>% as.data.frame()
`summarise()` regrouping output by 'Month', 'Year' (override with `.groups` argument)
Month Year DOW       Avg
1   Feb 2019 Fri 0.4027778
2   Feb 2019 Sat 0.5972222
3   Jan 2019 Fri 0.4819915
4   Jan 2019 Sat 0.5180085
> 

使用的数据:

structure(list(ID = c(1, 1, 1, 1, 2, 2, 2, 2), Month = c("Jan", 
"Jan", "Feb", "Feb", "Jan", "Jan", "Feb", "Feb"), Year = c(2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019), DOW = c("Fri", "Sat", 
"Fri", "Sat", "Fri", "Sat", "Fri", "Sat"), Value = c(20, 39, 
30, 24, 20, 12, 1, 3)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -8L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_double", 
"collector")), Month = structure(list(), class = c("collector_character", 
"collector")), Year = structure(list(), class = c("collector_double", 
"collector")), DOW = structure(list(), class = c("collector_character", 
"collector")), Value = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))
your_data = your_data %>%
group_by(ID, Month, Year) %>%
mutate(Percent = Value / sum(Value))
your_data %>% 
filter(ID %in% c(1, 2)) %>%
group_by(Month, Year, DOW) %>%
summarize(Avg = mean(Percent)

使用base

df$Percent <- with(df, ave(Value, list(ID, Month), FUN = prop.table))
ID Month Year DOW Value   Percent
1  1   Jan 2019 Fri    20 0.3389831
2  1   Jan 2019 Sat    39 0.6610169
3  1   Feb 2019 Fri    30 0.5555556
4  1   Feb 2019 Sat    24 0.4444444
5  2   Jan 2019 Fri    20 0.6250000
6  2   Jan 2019 Sat    12 0.3750000
7  2   Feb 2019 Fri     1 0.2500000
8  2   Feb 2019 Sat     3 0.7500000

最新更新