好吧,我有一个这样的数据帧:
ID Month Year DOW Value
1 Jan 2019 Fri 20
1 Jan 2019 Sat 39
1 Feb 2019 Fri 30
1 Feb 2019 Sat 24
2 Jan 2019 Fri 20
2 Jan 2019 Sat 12
2 Feb 2019 Fri 1
2 Feb 2019 Sat 3
我的目标是确定每行相对于月份年份的百分比;
手动计算这个答案应该是:
ID Month Year DOW Value Percent
1 Jan 2019 Fri 20 .338
1 Jan 2019 Sat 39 .661
1 Feb 2019 Fri 30 .554
1 Feb 2019 Sat 24 .444
2 Jan 2019 Fri 20 .625
2 Jan 2019 Sat 12 .375
2 Feb 2019 Fri 1 .25
2 Feb 2019 Sat 3 .75
请注意,月份-年份组合总是加1。
最后,我想取ID 1和2 刚刚计算的百分比的平均值
Month Year DOW Avg
Jan 2019 Fri 0.482
Jan 2019 Sat 0.518
Feb 2019 Fri 0.402
Feb 2019 Sat 0.597
目标是使用dplyr
这行吗:
>library(dplyr)
> df %>% group_by(ID, Month, Year) %>% mutate(Percent = Value/sum(Value)) %>%
group_by(Month, Year,DOW) %>% summarise(Avg = mean(Percent)) %>% as.data.frame()
`summarise()` regrouping output by 'Month', 'Year' (override with `.groups` argument)
Month Year DOW Avg
1 Feb 2019 Fri 0.4027778
2 Feb 2019 Sat 0.5972222
3 Jan 2019 Fri 0.4819915
4 Jan 2019 Sat 0.5180085
>
使用的数据:
structure(list(ID = c(1, 1, 1, 1, 2, 2, 2, 2), Month = c("Jan",
"Jan", "Feb", "Feb", "Jan", "Jan", "Feb", "Feb"), Year = c(2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019), DOW = c("Fri", "Sat",
"Fri", "Sat", "Fri", "Sat", "Fri", "Sat"), Value = c(20, 39,
30, 24, 20, 12, 1, 3)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -8L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_double",
"collector")), Month = structure(list(), class = c("collector_character",
"collector")), Year = structure(list(), class = c("collector_double",
"collector")), DOW = structure(list(), class = c("collector_character",
"collector")), Value = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
your_data = your_data %>%
group_by(ID, Month, Year) %>%
mutate(Percent = Value / sum(Value))
your_data %>%
filter(ID %in% c(1, 2)) %>%
group_by(Month, Year, DOW) %>%
summarize(Avg = mean(Percent)
使用base
df$Percent <- with(df, ave(Value, list(ID, Month), FUN = prop.table))
ID Month Year DOW Value Percent
1 1 Jan 2019 Fri 20 0.3389831
2 1 Jan 2019 Sat 39 0.6610169
3 1 Feb 2019 Fri 30 0.5555556
4 1 Feb 2019 Sat 24 0.4444444
5 2 Jan 2019 Fri 20 0.6250000
6 2 Jan 2019 Sat 12 0.3750000
7 2 Feb 2019 Fri 1 0.2500000
8 2 Feb 2019 Sat 3 0.7500000