动态比较 R 中 data.table 中的列



我有一个data.table,它有多个日期列。这些是K日期列,因此K更改。我已经计算了列之间的时间差:

K <- numberOfYears
dateCols = c("fromdatenext", paste0("fromdate" , 1:K))
# create formulas dynamically 
all_operations = lapply(seq_len(length(dateCols) - 1), function(i){
as.formula(paste("~difftime(", dateCols[i + 1], ",", dateCols[i],", units = c('weeks'))"))
})
df %>%
mutate_(.dots = setNames(all_operations, paste0("Diff", seq_len(length(dateCols) - 1))))

并得到类似的东西:

fromdatenext  fromdate1  fromdate2  fromdate3  fromdate4           Diff1           Diff2           Diff3           Diff4
1   2018-01-01 2017-01-01 2016-01-01 2015-01-01 2014-01-01 -52.14286 weeks -52.28571 weeks -52.14286 weeks -52.14286 weeks
2   2018-10-01 2017-10-01 2016-10-01 2015-10-01 2014-10-01 -52.14286 weeks -52.14286 weeks -52.28571 weeks -53.14286 weeks
3   2018-09-08 2017-09-08 2016-09-08 2015-09-08 2014-09-08 -52.14286 weeks -52.14286 weeks -52.28571 weeks -52.14286 weeks
4   2018-09-22 2017-09-22 2016-09-22 2015-09-22 2014-09-22 -52.14286 weeks -52.14286 weeks -52.28571 weeks -52.14286 weeks
5   2018-05-01 2017-05-01 2016-05-01 2015-05-01 2014-05-01 -52.14286 weeks -52.14286 weeks -52.28571 weeks -52.14286 weeks
6   2018-01-01 2017-01-01 2016-01-01 2015-01-01 2014-01-01 -50.14286 weeks -52.28571 weeks -52.14286 weeks -52.14286 weeks

现在我需要删除所有时间间隔不相等的行。如何动态完成此操作,以便在K时它仍然有效?

所以期望的结果将是:

fromdatenext  fromdate1  fromdate2  fromdate3  fromdate4           Diff1           Diff2           Diff3           Diff4
1   2018-01-01 2017-01-01 2016-01-01 2015-01-01 2014-01-01 -52.14286 weeks -52.28571 weeks -52.14286 weeks -52.14286 weeks
3   2018-09-08 2017-09-08 2016-09-08 2015-09-08 2014-09-08 -52.14286 weeks -52.14286 weeks -52.28571 weeks -52.14286 weeks
4   2018-09-22 2017-09-22 2016-09-22 2015-09-22 2014-09-22 -52.14286 weeks -52.14286 weeks -52.28571 weeks -52.14286 weeks
5   2018-05-01 2017-05-01 2016-05-01 2015-05-01 2014-05-01 -52.14286 weeks -52.14286 weeks -52.28571 weeks -52.14286 weeks

一般来说,我认为这种类型的数据(结构相同/目的列的可变数量(在"长"格式中可能会做得更好,在这种情况下,我建议你的方法应该是:

library(dplyr)
library(tidyr) # pivot_*
dat2 <- dat %>%
mutate(rn = row_number()) %>%
pivot_longer(matches("fromdate[0-9]+"), names_to = "num", values_to = "fromdate") %>%
group_by(rn) %>%
mutate(
fromdatenext = as.Date(fromdatenext),
fromdate = as.Date(fromdate),
num = gsub("\D", "", num),
Diff = `units<-`(diff(c(first(fromdatenext), fromdate)), "weeks")
) %>%
ungroup()
dat2
# # A tibble: 24 x 5
#    fromdatenext    rn num   fromdate   Diff          
#    <date>       <int> <chr> <date>     <drtn>        
#  1 2018-01-01       1 1     2017-01-01 -52.1429 weeks
#  2 2018-01-01       1 2     2016-01-01 -52.2857 weeks
#  3 2018-01-01       1 3     2015-01-01 -52.1429 weeks
#  4 2018-01-01       1 4     2014-01-01 -52.1429 weeks
#  5 2018-10-01       2 1     2017-10-01 -52.1429 weeks
#  6 2018-10-01       2 2     2016-10-01 -52.1429 weeks
#  7 2018-10-01       2 3     2015-10-01 -52.2857 weeks
#  8 2018-10-01       2 4     2014-10-01 -52.1429 weeks
#  9 2018-09-08       3 1     2017-09-08 -52.1429 weeks
# 10 2018-09-08       3 2     2016-09-08 -52.1429 weeks
# # ... with 14 more rows

但是,如果您希望/需要它们以更宽的格式,我们可以重新透视回宽格式:

dat2 %>%
pivot_wider(
id_cols = rn:fromdatenext,
names_from = num,
values_from = c(fromdate, Diff),
names_sep = ""
)
# # A tibble: 6 x 10
#      rn fromdatenext fromdate1  fromdate2  fromdate3  fromdate4  Diff1          Diff2          Diff3          Diff4         
#   <int> <date>       <date>     <date>     <date>     <date>     <drtn>         <drtn>         <drtn>         <drtn>        
# 1     1 2018-01-01   2017-01-01 2016-01-01 2015-01-01 2014-01-01 -52.1429 weeks -52.2857 weeks -52.1429 weeks -52.1429 weeks
# 2     2 2018-10-01   2017-10-01 2016-10-01 2015-10-01 2014-10-01 -52.1429 weeks -52.1429 weeks -52.2857 weeks -52.1429 weeks
# 3     3 2018-09-08   2017-09-08 2016-09-08 2015-09-08 2014-09-08 -52.1429 weeks -52.1429 weeks -52.2857 weeks -52.1429 weeks
# 4     4 2018-09-22   2017-09-22 2016-09-22 2015-09-22 2014-09-22 -52.1429 weeks -52.1429 weeks -52.2857 weeks -52.1429 weeks
# 5     5 2018-05-01   2017-05-01 2016-05-01 2015-05-01 2014-05-01 -52.1429 weeks -52.1429 weeks -52.2857 weeks -52.1429 weeks
# 6     6 2018-01-01   2017-01-01 2016-01-01 2015-01-01 2014-01-01 -52.1429 weeks -52.2857 weeks -52.1429 weeks -52.1429 weeks

数据:

dat <- read.table(header = TRUE, text = "
fromdatenext  fromdate1  fromdate2  fromdate3  fromdate4
2018-01-01 2017-01-01 2016-01-01 2015-01-01 2014-01-01
2018-10-01 2017-10-01 2016-10-01 2015-10-01 2014-10-01
2018-09-08 2017-09-08 2016-09-08 2015-09-08 2014-09-08
2018-09-22 2017-09-22 2016-09-22 2015-09-22 2014-09-22
2018-05-01 2017-05-01 2016-05-01 2015-05-01 2014-05-01
2018-01-01 2017-01-01 2016-01-01 2015-01-01 2014-01-01")

最新更新