r-按id为多个日期之间的每个日期创建一行



数据帧当前看起来像这个

# A tibble: 20 x 3
Badge `Effective Date`    day_off
<dbl> <dttm>                <int>
1  3162 2013-01-16 00:00:00       1
2  3162 2013-01-19 00:00:00       2
3  3162 2013-02-21 00:00:00       3
5  3585 2015-10-21 00:00:00       5
6  3586 2014-05-21 00:00:00       6
7  3586 2014-05-23 00:00:00       7

我想为每个生效日期之间的每个徽章编号每天创建一个新行,使其看起来像这样。数据帧是巨大的,所以一些像complete这样资源密集型的整洁的verse函数是不起作用的。

# A tibble: 20 x 3
Badge `Effective Date`    day_off
<dbl> <dttm>                <int>
1  3162 2013-01-16 00:00:00       1
2  3162 2013-01-17 00:00:00.      1
3  3162 2013-01-18 00:00:00.      1
4  3162 2013-01-19 00:00:00       2
5  3162 2013-01-20 00:00:00       2
6  3162 2013-01-21 00:00:00       3
7  3585 2015-10-21 00:00:00       5
8  3586 2014-05-21 00:00:00       6
9  3586 2014-05-22 00:00:00       6
10 3586 2014-05-23 00:00:00       7

您可以创建一个表,其中对于每个Badge组,您都有从第一个到最后一个的日期时间序列。然后对该数据帧进行滚动连接,得到所需的输出

library(data.table)
## Create reproducible example as an R object 
# Please do this yourself next time using dput. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
df <- fread('
Badge , Effective_Date      , day_off
3162 , 2013-01-16 00:00:00 ,      1
3162 , 2013-01-19 00:00:00 ,      2
3162 , 2013-01-21 00:00:00 ,      3
3585 , 2015-10-21 00:00:00 ,      5
3586 , 2014-05-21 00:00:00 ,      6
3586 , 2014-05-23 00:00:00 ,      7
')
df[, Effective_Date := as.POSIXct(Effective_Date)]
## Rolling join
setDT(df) # required if data wasn't originally a data.table as above
df[df[, .(Effective_Date = seq(min(Effective_Date), max(Effective_Date), by = '1 day')),
by = .(Badge)],
on = .(Badge, Effective_Date), roll = TRUE]
#>     Badge Effective_Date day_off
#>  1:  3162     2013-01-16       1
#>  2:  3162     2013-01-17       1
#>  3:  3162     2013-01-18       1
#>  4:  3162     2013-01-19       2
#>  5:  3162     2013-01-20       2
#>  6:  3162     2013-01-21       3
#>  7:  3585     2015-10-21       5
#>  8:  3586     2014-05-21       6
#>  9:  3586     2014-05-22       6
#> 10:  3586     2014-05-23       7

创建于2021-07-16由reprex包(v2.0.0(

tidyverse方式将使用completefill-

library(dplyr)
library(tidyr)
df %>%
group_by(Badge) %>%
complete(Effective_Date = seq(min(Effective_Date), 
max(Effective_Date), by = '1 day')) %>%
fill(day_off) %>%
ungroup
#    Badge Effective_Date      day_off
#   <int> <dttm>                <int>
# 1  3162 2013-01-16 00:00:00       1
# 2  3162 2013-01-17 00:00:00       1
# 3  3162 2013-01-18 00:00:00       1
# 4  3162 2013-01-19 00:00:00       2
# 5  3162 2013-01-20 00:00:00       2
# 6  3162 2013-01-21 00:00:00       3
# 7  3585 2015-10-21 00:00:00       5
# 8  3586 2014-05-21 00:00:00       6
# 9  3586 2014-05-22 00:00:00       6
#10  3586 2014-05-23 00:00:00       7

相关内容

  • 没有找到相关文章