数据帧和一些变量:
library(tidyverse)
library(lubridate)
budget_2020_q4 <- 1000000
budget_2021_q1 <- 2000000
budget_2021_q2 <- 3000000
budget_2021_q3 <- 3000000
budget_2021_q4 <- 2000000
calendar <- data.frame(
cohort = seq('2020-10-01' %>% ymd, '2021-12-31' %>% ymd, by = '1 days')) %>%
mutate(Quarter = quarter(cohort, with_year = T))
我现在有一个显示日期和这些日期所在季度的数据框:
calendar %>% head
cohort Quarter
1 2020-10-01 2020.4
2 2020-10-02 2020.4
3 2020-10-03 2020.4
4 2020-10-04 2020.4
5 2020-10-05 2020.4
6 2020-10-06 2020.4
我也知道每个季度的频率:
calendar$Quarter %>% table
.
2020.4 2021.1 2021.2 2021.3 2021.4
92 90 91 92 92
我想修改一个新的专栏"daily_budget",它将该季度的预算除以该季度的日期频率。
例如,2020年第四季度的预算为1000000,第四季度有92天,因此每日预算为1000000/92=10869.57
在mutate(Quarter = quarter(cohort, with_year = T))
之后,我可以以某种方式将此计算集成到我的dplyr操作管道中吗?
首先,让我们把预算放在一个数据框架中:
budgets <- c(budget_2020_q4 = 1000000,
budget_2021_q1 = 2000000,
budget_2021_q2 = 3000000,
budget_2021_q3 = 3000000,
budget_2021_q4 = 2000000) %>%
enframe(name = "Quarter", value = "budget") %>%
mutate(Quarter = as.numeric(str_replace(str_remove(Quarter, "budget_"), "_q", ".")))
然后,这是一个count
的问题(tidyverse对table
的替代方案(,即每个Quarter
的行数,将预算相加并除以两者:
calendar %>%
add_count(Quarter) %>%
left_join(budgets, by = "Quarter") %>%
mutate(budget_by_day = budget / n)
这就产生了
cohort Quarter n budget budget_by_day
1 2020-10-01 2020.4 92 1e+06 10869.57
2 2020-10-02 2020.4 92 1e+06 10869.57
3 2020-10-03 2020.4 92 1e+06 10869.57
4 2020-10-04 2020.4 92 1e+06 10869.57
5 2020-10-05 2020.4 92 1e+06 10869.57
6 2020-10-06 2020.4 92 1e+06 10869.57
7 2020-10-07 2020.4 92 1e+06 10869.57
8 2020-10-08 2020.4 92 1e+06 10869.57
9 2020-10-09 2020.4 92 1e+06 10869.57
10 2020-10-10 2020.4 92 1e+06 10869.57
...