r语言 - 如何判断在一组之前和之后的时间之间是否存在时间点?



我试图回答一个关于堆栈溢出的问题(使用R映射多个id),当我被如何完成它卡住了。也就是说,我如何测试在一组前后时间点之间是否存在一个时间点。

帖子中的用户没有做一个可复制的例子,但这里是我想到的。我想用数据帧emtek_file中的前后时间来测试hidenic_file$hidenic_time中的时间点,并返回与每个hidenic_id的时间框架相匹配的emtek_id。海报没有提到它,但似乎有多个emtek_id的每个hidenic_id被返回的可能性。

library(zoo)
date_string <- paste("2001", sample(12, 10, 3), sample(28,10), sep = "-")
time_string <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
                 "23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")
entry_emtek <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
entry_emtek <- entry_emtek[order(entry_emtek)]
exit_emtek <- entry_emtek + 3600 * 24
emtek_file <- data.frame(emtek_id = 1:10, entry_emtek, exit_emtek)
hidenic_id <- 110380:110479
date_string <- paste("2001", sample(12, 100, replace = TRUE), sample(28,100, replace = T), sep = "-")
time_string <- rep(c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
                 "23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26"),10)
hidenic_time <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
hidenic_time <- hidenic_time[order(hidenic_time)]
hidenic_file <- data.frame(hidenic_id, hidenic_time)
##Here is where I fail to write concise and working code to find what I want. 
combined_file <- list() 
for(i in seq(hidenic_file[,1])) {
  for(j in seq(emtek_file[,1])) {
    if(length(zoo(1, emtek_file[j,2:3]) + zoo(1,hidenic_file[i,2])) == 0) {next}
    if(length(zoo(1, emtek_file[j,2:3]) + zoo(1,hidenic_file[i,2])) == 1) {combined_file[[i]] < c(combinedfile[[i]],emtek_file[j,1])}
  }
  names(combined_file)[i] <- hidenic_file[i,1]
}

由于您没有提供预期的结果,我不确定是否理解您想要做的所有事情。这里有一个使用IRanges包的解决方案。第一次阅读可能不容易理解,但在连续间隔中找到重叠部分是非常有用的。

library(IRanges)
## create a time intervals 
subject <- IRanges(as.numeric(emtek_file$entry_emtek),
        as.numeric(emtek_file$exit_emtek))
## create a time intervals (start=end here)
query <- IRanges(as.numeric(hidenic_file$hidenic_time),
        as.numeric(hidenic_file$hidenic_time))
## find overlaps and extract rows (both time point and intervals)  
emt.ids <- subjectHits(findOverlaps(query,subject))
hid.ids <- queryHits(findOverlaps(query,subject))
cbind(hidenic_file[hid.ids,],emtek_file[emt.ids,])
 hidenic_id        hidenic_time emtek_id         entry_emtek          exit_emtek
8      110387 2001-03-13 22:29:56        3 2001-03-13 22:29:56 2001-03-14 22:29:56
9      110388 2001-03-14 01:03:30        3 2001-03-13 22:29:56 2001-03-14 22:29:56
41     110420 2001-06-09 16:56:26        7 2001-06-09 16:56:26 2001-06-10 16:56:26

Ps:安装包:

  source("http://bioconductor.org/biocLite.R")
  biocLite("IRanges")

最新更新