我写了一个(相当天真的)函数,可以在两个指定的日期/时间之间随机选择
# set start and end dates to sample between
day.start <- "2012/01/01"
day.end <- "2012/12/31"
# define a random date/time selection function
rand.day.time <- function(day.start,day.end,size) {
dayseq <- seq.Date(as.Date(day.start),as.Date(day.end),by="day")
dayselect <- sample(dayseq,size,replace=TRUE)
hourselect <- sample(1:24,size,replace=TRUE)
minselect <- sample(0:59,size,replace=TRUE)
as.POSIXlt(paste(dayselect, hourselect,":",minselect,sep="") )
}
结果是:
> rand.day.time(day.start,day.end,size=3)
[1] "2012-02-07 21:42:00" "2012-09-02 07:27:00" "2012-06-15 01:13:00"
但随着样本量的增加,这种情况似乎正在显著放缓。
# some benchmarking
> system.time(rand.day.time(day.start,day.end,size=100000))
user system elapsed
4.68 0.03 4.70
> system.time(rand.day.time(day.start,day.end,size=200000))
user system elapsed
9.42 0.06 9.49
有人能建议如何以更有效的方式做这样的事情吗?
Ahh,我们可以将另一个日期/时间问题简化为在浮点中工作:)
尝试此功能
R> latemail <- function(N, st="2012/01/01", et="2012/12/31") {
+ st <- as.POSIXct(as.Date(st))
+ et <- as.POSIXct(as.Date(et))
+ dt <- as.numeric(difftime(et,st,unit="sec"))
+ ev <- sort(runif(N, 0, dt))
+ rt <- st + ev
+ }
R>
我们以秒为单位计算difftime
,然后"仅仅"在它上面画上制服,对结果进行排序。把它加到开头,你就完成了:
R> set.seed(42); print(latemail(5)) ## round to date, or hour, or ...
[1] "2012-04-14 05:34:56.369022 CDT" "2012-08-22 00:41:26.683809 CDT"
[3] "2012-10-29 21:43:16.335659 CDT" "2012-11-29 15:42:03.387701 CST"
[5] "2012-12-07 18:46:50.233761 CST"
R> system.time(latemail(100000))
user system elapsed
0.024 0.000 0.021
R> system.time(latemail(200000))
user system elapsed
0.044 0.000 0.045
R> system.time(latemail(10000000)) ## a few more than in your example :)
user system elapsed
3.240 0.172 3.428
R>
类似的东西也会起作用。很抱歉出现了随机数据帧,我只是把它放进去,这样你就可以看到一个情节了。
data=as.data.frame(list(ID=1:10,
variable=rnorm(10,50,10)))
#This function will generate a uniform sample of dates from
#within a designated start and end date:
rand.date=function(start.day,end.day,data){
size=dim(data)[1]
days=seq.Date(as.Date(start.day),as.Date(end.day),by="day")
pick.day=runif(size,1,length(days))
date=days[pick.day]
}
#This will create a new column within your data frame called date:
data$date=rand.date("2014-01-01","2014-02-28",data)
#and this will order your data frame by date:
data=data[order(data$date),]
#Finally, you can see how the data looks
plot(data$date,data$variable,type="b")