r语言 - 使用两个数据集中的信息创建新列导致的缺失值问题 - r - missing value issue caused by creating a new column using info from two datasets 小贝子编程网

我有两个大型数据集，如下所示：

df1=data.frame(subject = c(rep(1, 12), rep(2, 10)), day =c(1,1,1,1,1,2,3,15,15,15,15,19,1,1,1,1,2,3,15,15,15,15),stime=c('4/16/2012 6:25','4/16/2012 7:01','4/16/2012 17:22','4/16/2012 17:45','4/16/2012 18:13','4/18/2012 6:50','4/19/2012 6:55','5/1/2012 6:28','5/1/2012 7:00','5/1/2012 16:28','5/1/2012 17:00','5/5/2012 17:00','4/23/2012 5:56','4/23/2012 6:30','4/23/2012 16:55','4/23/2012 17:20','4/25/2012 6:32','4/26/2012 6:28','5/8/2012 5:54','5/8/2012 6:30','5/8/2012 15:55','5/8/2012 16:30'))
df2=data.frame(subject = c(rep(1, 10), rep(2, 10)), day=c(1,1,2,2,3,3,9,9,15,15,1,1,2,2,3,3,9,9,15,15),dtime=c('4/16/2012 6:15','4/16/2012 15:16','4/18/2012 7:15','4/18/2012 21:45','4/19/2012 7:05','4/19/2012 23:17','4/28/2012 7:15','4/28/2012 21:12','5/1/2012 7:15','5/1/2012 15:15','4/23/2012 6:45','4/23/2012 16:45','4/25/2012 6:45','4/25/2012 21:30','4/26/2012 6:45','4/26/2012 22:00','5/2/2012 7:00','5/2/2012 22:00','5/8/2012 6:45','5/8/2012 15:45'))

。

在DF2中，"dtime"每天包含每个主题的两个时间点。我想在 df1 中使用每天每个子子的时间点（即。'stime'）减去DF2中每天每个子的第二个时间点，如果结果为正，则在dtime中给出该观测值的第二个时间点，否则给出第一个时间点。例如，对于第 1 天的受试者 1，（'4/16/2012 6：25'-'4/16/2012 15：16'）<0，因此我们将第一个时间点 '4/16/2012 6：15' 交给这个 obs;（'4/16/2012 17：22'-'4/16/2012 15：16'）>0，所以我们给这个第二个时间点'4/16/2012 15：16'给这个OBS。预期输出应如下所示：

df3=data.frame(subject = c(rep(1, 12), rep(2, 10)), day =c(1,1,1,1,1,2,3,15,15,15,15,19,1,1,1,1,2,3,15,15,15,15),stime=c('4/16/2012 6:25','4/16/2012 7:01','4/16/2012 17:22','4/16/2012 17:45','4/16/2012 18:13','4/18/2012 6:50','4/19/2012 6:55','5/1/2012 6:28','5/1/2012 7:00','5/1/2012 16:28','5/1/2012 17:00','5/5/2012 17:00','4/23/2012 5:56','4/23/2012 6:30','4/23/2012 16:55','4/23/2012 17:20','4/25/2012 6:32','4/26/2012 6:28','5/8/2012 5:54','5/8/2012 6:30','5/8/2012 15:55','5/8/2012 16:30'), dtime=c('4/16/2012 6:15','4/16/2012 6:15','4/16/2012 15:16','4/16/2012 15:16','4/16/2012 15:16','4/18/2012 7:15','4/19/2012 7:05','5/1/2012 7:15','5/1/2012 7:15','5/1/2012 15:15','5/1/2012 15:15','.','4/23/2012 6:45','4/23/2012 6:45','4/23/2012 16:45','4/23/2012 16:45','4/25/2012 6:45','4/26/2012 6:45','5/8/2012 6:45','5/8/2012 6:45','5/8/2012 15:45','5/8/2012 15:45'))

。

我使用下面的代码来实现这一点，但是，由于缺少第 19 天的"dtime"，R 一直给我错误：

df1$dtime <- apply(df1, 1, function(x){  
                  choices <- df2[ df2$subject==as.numeric(x["subject"]) & 
                                       df2$day==as.numeric(x["day"]) , "dtime"]
         if( as.POSIXct(x["stime"], format="%m/%d/%Y %H:%M") < 
                 as.POSIXct(choices[2],format="%m/%d/%Y %H:%M") ) {
            choices[1] 
            }else{ choices[2] } 
                                  } )
Error in if (as.POSIXct(x["stime"], format = "%m/%d/%Y %H:%M") < as.POSIXct(choices[2],  : missing value where TRUE/FALSE needed

由于我的数据集很大（大约 15,000 行和 30 列），因此 df2 中缺少一些"dtime"。有谁知道如何解决这个问题？

我认为这可能对你有用：

df1$dtime <- apply(df1, 1, function(x) {
  choices <- as.character(df2[ df2$subject==as.numeric(x["subject"]) & 
                  df2$day==as.numeric(x["day"]), "dtime"])
  t1 <- as.POSIXct(as.character(x["stime"]), format="%m/%d/%Y %H:%M")
  t2 <- as.POSIXct(choices[2], format="%m/%d/%Y %H:%M")
  return(ifelse(( t1 < t2 ), choices[1], choices[2]))
})

r语言 - 使用两个数据集中的信息创建新列导致的缺失值问题

相关内容

最新更新

热门标签：