我有一个看起来像这样的数据帧(df):
library(dplyr)
library(lubridate)
id gender education e-week
1 100236 0 Bachelor or equivalent 2012-01-22
2 100237 0 Secondary education 2010-03-14
3 100248 0 Master and doctoral 2010-04-25
4 100257 0 Master and doctoral 2012-01-22
5 100271 0 Bachelor or equivalent 2011-05-22
6 100285 0 Primary education 2012-01-15
7 100303 0 Master and doctoral 2013-01-13
8 100305 0 Secondary education 2011-09-25
9 100316 0 Secondary education 2012-12-30
10 100354 0 Secondary education 2010-08-22
真实数据集要长得多。我用
从原始日期获得'week'变量df <- df %>%
mutate(., e_week = floor_date(date_exit, unit = "week")
下一步是从感兴趣的日期开始为不同的时间"窗口"创建虚拟变量。首先,我手工创建它们,如下所示:
df <- df %>%
mutate(.,treshold_1week =ifelse(e_week %within%
interval(start = as.Date('2009-05-17') - weeks(x = 1),
end = '2009-05-17'),
1, 0 ))
这是利息日之前的一个星期。我分别在利息日期前后的2、3、4、5、6周手工计算。现在我想将窗口扩展到感兴趣日期前后的40周。有没有一种更快更有效的方法来做到这一点,而不用为每个哑变量写一个新的ifelse()
函数?
我面临的挑战是,我想为接近感兴趣日期的每个星期创建一个新的虚拟变量。因此,我正在寻找40个虚拟变量,基本上表示缩短的时间间隔,即
treshold_40weeks、treshold_39weeks、treshold_38_weeks等
使用dplyr
,purrr
library(dplyr)
library(purrr)
library(lubridate)
data <- tibble(e_week = seq(as.Date("2008-01-01"), by = "7 days", length.out = 300))
week <- seq(1, 40, by = 1)
generate_dummy <- function(x, df) {
df %>%
mutate("threshod_{x}week" := ifelse(e_week %within%
interval(start = as.Date('2009-05-17') - weeks(x),
end = '2009-05-17'),
1, 0 ))
}
reduce(map(week, generate_dummy, df = data), .f = left_join, by = "e_week")
输出e_week threshod_1week threshod_2week threshod_3week
Min. :2008-01-01 Min. :0.000000 Min. :0.000000 Min. :0.00
1st Qu.:2009-06-07 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00
Median :2010-11-12 Median :0.000000 Median :0.000000 Median :0.00
Mean :2010-11-12 Mean :0.003333 Mean :0.006667 Mean :0.01
3rd Qu.:2012-04-18 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00
Max. :2013-09-24 Max. :1.000000 Max. :1.000000 Max. :1.00
threshod_4week threshod_5week threshod_6week threshod_7week
Min. :0.00000 Min. :0.00000 Min. :0.00 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000
Median :0.00000 Median :0.00000 Median :0.00 Median :0.00000
Mean :0.01333 Mean :0.01667 Mean :0.02 Mean :0.02333
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00000 Max. :1.00 Max. :1.00000
threshod_8week threshod_9week threshod_10week threshod_11week
Min. :0.00000 Min. :0.00 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00000 Median :0.00 Median :0.00000 Median :0.00000
Mean :0.02667 Mean :0.03 Mean :0.03333 Mean :0.03667
3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00 Max. :1.00000 Max. :1.00000
threshod_12week threshod_13week threshod_14week threshod_15week
Min. :0.00 Min. :0.00000 Min. :0.00000 Min. :0.00
1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00
Median :0.00 Median :0.00000 Median :0.00000 Median :0.00
Mean :0.04 Mean :0.04333 Mean :0.04667 Mean :0.05
3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00
Max. :1.00 Max. :1.00000 Max. :1.00000 Max. :1.00
threshod_16week threshod_17week threshod_18week threshod_19week
Min. :0.00000 Min. :0.00000 Min. :0.00 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000
Median :0.00000 Median :0.00000 Median :0.00 Median :0.00000
Mean :0.05333 Mean :0.05667 Mean :0.06 Mean :0.06333
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00000 Max. :1.00 Max. :1.00000
threshod_20week threshod_21week threshod_22week threshod_23week
Min. :0.00000 Min. :0.00 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00000 Median :0.00 Median :0.00000 Median :0.00000
Mean :0.06667 Mean :0.07 Mean :0.07333 Mean :0.07667
3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00 Max. :1.00000 Max. :1.00000
threshod_24week threshod_25week threshod_26week threshod_27week
Min. :0.00 Min. :0.00000 Min. :0.00000 Min. :0.00
1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00
Median :0.00 Median :0.00000 Median :0.00000 Median :0.00
Mean :0.08 Mean :0.08333 Mean :0.08667 Mean :0.09
3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00
Max. :1.00 Max. :1.00000 Max. :1.00000 Max. :1.00
threshod_28week threshod_29week threshod_30week threshod_31week
Min. :0.00000 Min. :0.00000 Min. :0.0 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0 1st Qu.:0.0000
Median :0.00000 Median :0.00000 Median :0.0 Median :0.0000
Mean :0.09333 Mean :0.09667 Mean :0.1 Mean :0.1033
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0 3rd Qu.:0.0000
Max. :1.00000 Max. :1.00000 Max. :1.0 Max. :1.0000
threshod_32week threshod_33week threshod_34week threshod_35week
Min. :0.0000 Min. :0.00 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.00 Median :0.0000 Median :0.0000
Mean :0.1067 Mean :0.11 Mean :0.1133 Mean :0.1167
3rd Qu.:0.0000 3rd Qu.:0.00 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.00 Max. :1.0000 Max. :1.0000
threshod_36week threshod_37week threshod_38week threshod_39week
Min. :0.00 Min. :0.0000 Min. :0.0000 Min. :0.00
1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00
Median :0.00 Median :0.0000 Median :0.0000 Median :0.00
Mean :0.12 Mean :0.1233 Mean :0.1267 Mean :0.13
3rd Qu.:0.00 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00
Max. :1.00 Max. :1.0000 Max. :1.0000 Max. :1.00
threshod_40week
Min. :0.0000
1st Qu.:0.0000
Median :0.0000
Mean :0.1333
3rd Qu.:0.0000
Max. :1.0000