我正在将未知数量的多个日期周期拟合在一个主要的周期序列中。有没有比我的方法更可读或更紧凑的方法?我在这里混合了sqldf和data.table,因为我在项目中使用了这两个库,但如果需要,可以在纯data.table中使用。反之亦然。其他开发良好的库也可以
没有硬编码,我用几个选项运行这个,不知道周期内的最大周期数。周期内的数量没有理论上限(可能是365),但根据规范,基本的健全性检查应该将最大值设定在6-15个周期左右。
这是我使用atm的工作代码的表示(在原始脚本中作为函数编写)。
我相信应该有一个更短/更可读的函数。在dcast部分,我们能做些什么来计算差异并应用求和函数吗?我试过了,但fun.aggregate似乎只限于更简单的操作。
library("data.table")
library("sqldf")
Data <- data.table(
Fnr = c(22516, 22516, 22516, 45459, 45459),
Vernr = c(1,1,1,1,2),
Startdat = c("2010-01-01", "2010-01-01", "2012-01-01",
"2013-04-01", "2013-04-01"),
Endat = c("2010-12-31", "2010-12-31", "2012-05-19",
"2014-03-31", "2014-03-31"),
Fromdat = c("2010-02-21", "2010-08-16", NA, "2013-08-31", "2014-01-02"),
Tomdat = c("2010-05-16", "2010-09-11", NA, "2013-10-27", "2014-02-13")
)
tmp.eval <- "list(Fnr, Vernr, Startdat, Endat)"
dt_tmp <- Data[, nobs_id := order(Fromdat),
by = eval(parse(text = tmp.eval))]
dt_tmp <- dt_tmp[, c("Fromdat", "Tomdat") := list(as.Date(Fromdat),
as.Date(Tomdat))]
dt_tmp <- dcast(dt_tmp, Fnr + Vernr + ... ~ nobs_id,
value.var = c("Fromdat", "Tomdat"))
dt.colnames <- data.table(colnames(dt_tmp))
dt.col1 <- dt.colnames[substr(V1,1,3) == "Fro"][, c("nobs_id", "Fromdat") :=
list(order(V1),V1)]
dt.col2 <- dt.colnames[substr(V1,1,3) == "Tom"][, c("nobs_id", "Tomdat") :=
list(order(V1),V1)]
dt.set <- merge(dt.col1[,V1 := NULL], dt.col2[,V1 := NULL],
by = "nobs_id")
dt.set <- dt.set[, diff_col := paste(Tomdat,Fromdat, sep = "-")]
dt.set <- dt.set[, diff_col := paste(diff_col, " diff_",nobs_id, sep = "")]
dt.set <- dt.set[, diffvar_col := paste("as.numeric(diff_",nobs_id,")",
sep = "")]
str.diff <- paste(dt.set$diff_col, collapse = ",")
str.diffvar <- paste(dt.set$diffvar_col, collapse = ",")
str.diffvar <- paste("sum(", str.diffvar, ", na.rm = TRUE)")
dt_tmp <- sqldf(sprintf("SELECT *, %s FROM dt_tmp",
str.diff)
)
dt_tmp <- setDT(dt_tmp)[Startdat <= Endat,
corr_days := eval(parse (text = str.diffvar)),
by = list(Fnr, Vernr, Startdat, Endat)]
我不能100%确定你想做什么,但这里有一种方法可以获得dt_tmp
的相同最终输出。假设我正确地解释了你想要做的事情,那么诀窍就是先做你的计算,然后融化它,然后铸造它。
Data[, nobs_id := order(Fromdat),
by = list(Fnr, Vernr, Startdat, Endat)]
Data[,Tomdat:=as.IDate(Tomdat)]
Data[,Fromdat:=as.IDate(Fromdat)]
Data[,diff:=as.integer(Tomdat-Fromdat)]
Data[,corr_days:=sum(diff),by= list(Fnr, Vernr, Startdat, Endat)]
mytemp<-dcast(melt(Data,id.vars=c('Fnr','Vernr','Startdat','Endat','nobs_id','corr_days')),Fnr+Vernr+Startdat+Endat+corr_days~variable+nobs_id,value.var='value')
#if you want them in the same order as your dt_tmp
setcolorder(mytemp, c(names(mytemp)[!names(mytemp)%in%'corr_days'],'corr_days'))
#if you want the date columns to be type character again
datecols<-names(mytemp)[grepl('dat',names(mytemp))]
mytemp[,(datecols):=lapply(.SD,as.character),.SDcols=datecols]