我想创建一个数据框架,在其中我对从2015年11月29日到2020年9月5日的每周(从周日开始到周六结束的几周(的预约次数进行汇总。
首先,我根据下面的内容计算了每天的预约次数,但现在我很难跳到下一个练习——每周做一次。你知道有什么快速的方法吗?
appointments_day <- df %>%
group_by(region, id, name, date) %>%
summarise(appointments = n())
expand.grid(date = seq(min(df$date), max(df$date), by = '1 day')) %>%
left_join(., appointments_day)
我的数据集是:
region id name date appointments
A 1 clinic1 29-11-2015 2
A 1 clinic1 26-05-2020 1
A 1 clinic1 28-05-2020 4
A 1 clinic1 01-06-2020 2
A 1 clinic1 03-06-2020 2
A 2 clinic2 25-05-2020 3
A 2 clinic2 26-05-2020 1
A 2 clinic2 27-05-2020 4
B 3 clinic3 06-07-2020 3
B 3 clinic3 08-07-2020 2
B 3 clinic3 09-07-2020 1
我想创建以下数据集:
region id name first day of the week (as Sunday) appointments
A 1 clinic1 29-11-2015 2
....
A 1 clinic1 24-05-2020 5
A 1 clinic1 31-05-2020 4
....
A 2 clinic2 29-11-2015 0
....
A 2 clinic2 24-05-2020 8
....
B 3 clinic3 29-11-2015 0
....
B 3 clinic3 05-07-2020 6
基于您的数据集,我将执行以下操作:
library(data.table)
library(lubridate)
df <- setDT(your_dataset)
df$date <- dmy(df$date)
df$week_year <- paste0(week(df$date), '-', year(df$date))
result <- df[, .(appointments = .N), by = .(region, id, name, week_year)]
(你不必先聚合每天的日期。你可以使用我的代码来处理原始数据集(