数据帧当前看起来像这个
# A tibble: 20 x 3
Badge `Effective Date` day_off
<dbl> <dttm> <int>
1 3162 2013-01-16 00:00:00 1
2 3162 2013-01-19 00:00:00 2
3 3162 2013-02-21 00:00:00 3
5 3585 2015-10-21 00:00:00 5
6 3586 2014-05-21 00:00:00 6
7 3586 2014-05-23 00:00:00 7
我想为每个生效日期之间的每个徽章编号每天创建一个新行,使其看起来像这样。数据帧是巨大的,所以一些像complete这样资源密集型的整洁的verse函数是不起作用的。
# A tibble: 20 x 3
Badge `Effective Date` day_off
<dbl> <dttm> <int>
1 3162 2013-01-16 00:00:00 1
2 3162 2013-01-17 00:00:00. 1
3 3162 2013-01-18 00:00:00. 1
4 3162 2013-01-19 00:00:00 2
5 3162 2013-01-20 00:00:00 2
6 3162 2013-01-21 00:00:00 3
7 3585 2015-10-21 00:00:00 5
8 3586 2014-05-21 00:00:00 6
9 3586 2014-05-22 00:00:00 6
10 3586 2014-05-23 00:00:00 7
您可以创建一个表,其中对于每个Badge组,您都有从第一个到最后一个的日期时间序列。然后对该数据帧进行滚动连接,得到所需的输出
library(data.table)
## Create reproducible example as an R object
# Please do this yourself next time using dput. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
df <- fread('
Badge , Effective_Date , day_off
3162 , 2013-01-16 00:00:00 , 1
3162 , 2013-01-19 00:00:00 , 2
3162 , 2013-01-21 00:00:00 , 3
3585 , 2015-10-21 00:00:00 , 5
3586 , 2014-05-21 00:00:00 , 6
3586 , 2014-05-23 00:00:00 , 7
')
df[, Effective_Date := as.POSIXct(Effective_Date)]
## Rolling join
setDT(df) # required if data wasn't originally a data.table as above
df[df[, .(Effective_Date = seq(min(Effective_Date), max(Effective_Date), by = '1 day')),
by = .(Badge)],
on = .(Badge, Effective_Date), roll = TRUE]
#> Badge Effective_Date day_off
#> 1: 3162 2013-01-16 1
#> 2: 3162 2013-01-17 1
#> 3: 3162 2013-01-18 1
#> 4: 3162 2013-01-19 2
#> 5: 3162 2013-01-20 2
#> 6: 3162 2013-01-21 3
#> 7: 3585 2015-10-21 5
#> 8: 3586 2014-05-21 6
#> 9: 3586 2014-05-22 6
#> 10: 3586 2014-05-23 7
创建于2021-07-16由reprex包(v2.0.0(
tidyverse
方式将使用complete
和fill
-
library(dplyr)
library(tidyr)
df %>%
group_by(Badge) %>%
complete(Effective_Date = seq(min(Effective_Date),
max(Effective_Date), by = '1 day')) %>%
fill(day_off) %>%
ungroup
# Badge Effective_Date day_off
# <int> <dttm> <int>
# 1 3162 2013-01-16 00:00:00 1
# 2 3162 2013-01-17 00:00:00 1
# 3 3162 2013-01-18 00:00:00 1
# 4 3162 2013-01-19 00:00:00 2
# 5 3162 2013-01-20 00:00:00 2
# 6 3162 2013-01-21 00:00:00 3
# 7 3585 2015-10-21 00:00:00 5
# 8 3586 2014-05-21 00:00:00 6
# 9 3586 2014-05-22 00:00:00 6
#10 3586 2014-05-23 00:00:00 7