r语言 - 根据附加条件构造变量滞后



我想根据以下附加条件和操作创建一个滞后变量:

  • 当变量 (day_active) 的滞后(前一行)为 1 时,它也应该取变量的滞后n_wins

  • 当 day_active 的滞后(前一行)为 0 时,只要day_active保持 0,它就应该只重复前一行的 n_wins 值。

假设我们观察一个游戏玩家十天。 day_active表示他当天是否活跃,n_wins表示他赢得的比赛次数。

Example dataset:
da = data.frame(day = c(1,2,3,4,5,6,7,8,9,10), day_active = c(1,1,0,0,1,1,0,0,1,1), n_wins = c(2,3,0,0,1,0,0,0,0,1))
da
day day_active n_wins
1    1          1      2
2    2          1      3
3    3          0      0
4    4          0      0
5    5          1      1
6    6          1      0
7    7          0      0
8    8          0      0
9    9          1      0
10  10          1      1

这是转换后它应该的样子:

da2 = data.frame(day = c(1,2,3,4,5,6,7,8,9,10), day_active = c(1,1,0,0,1,1,0,0,1,1), n_wins = c(2,3,0,0,1,0,0,0,0,1), lag_n_wins = c(NA,2,3,3,3,1,0,0,0,0))
da2
day day_active n_wins lag_n_wins
1    1          1      2         NA
2    2          1      3          2
3    3          0      0          3
4    4          0      0          3
5    5          1      1          3
6    6          1      0          1
7    7          0      0          0
8    8          0      0          0
9    9          1      0          0
10  10          1      1          0

我们可以通过取逻辑向量的累积和,根据 'day_active' 中存在 1 创建一个分组列,然后if所有值都不为 0,替换为NA并将NA替换为之前的非 NA 元素与na.locf(来自zoo),ungroup并取创建列的lag

library(dplyr)    
da %>%
group_by(grp = cumsum(day_active == 1)) %>%
mutate(lag_n_wins = zoo::na.locf0(if(all(n_wins == 0)) n_wins 
else na_if(n_wins, 0)) ) %>%
ungroup %>% 
mutate(lag_n_wins = lag(lag_n_wins)) %>%
select(-grp)
# A tibble: 10 x 4
#     day day_active n_wins lag_n_wins
#   <dbl>      <dbl>  <dbl>      <dbl>
# 1     1          1      2         NA
# 2     2          1      3          2
# 3     3          0      0          3
# 4     4          0      0          3
# 5     5          1      1          3
# 6     6          1      0          1
# 7     7          0      0          0
# 8     8          0      0          0
# 9     9          1      0          0
#10    10          1      1          0

最新更新