根据另一个表中的开始和结束日期填充一个表的新列



我有一个较大的数据表(称为raw.data(和一个较小的数据表,列出了事件的开始和结束时间。

我想在较大的数据表中创建一个新列,该列将填充较小表中事件开始和结束日期之间的时间。不在事件开始/结束时间之间的时间可以标记为其他时间,这并不重要。

#the dput of the smaller table
> dput(balldrop.times)
structure(list(Stage = 6:14, 
BallStart = structure(c(1635837081, 1635847841, 1635856675, 1635866152, 1635878326, 1635886132, 1635895547, 1635902934, 1635911136), tzone = "", class = c("POSIXct", "POSIXt")), 
BallEnd = structure(c(1635837364, 1635848243, 1635857005, 1635866475, 1635878704, 1635886465, 1635895905, 1635903786, 1635911457), tzone = "", class = c("POSIXct", "POSIXt"))), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L))
#here is part of the larger table just in case
> dput(head(raw.data, 5))
structure(list(DateTime = structure(c(1635825603.6576, 1635825604.608, 1635825605.6448, 1635825606.6816, 1635825607.632), class = c("POSIXct", "POSIXt"), tzone = "GMT"), 
Press.Well = c(1154.2561461, 1154.0308849, 1149.7247783, 1152.0544566, 1155.7363779), 
row.names = c(NA, -5L), 
class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000020725b51ef0>)

我想要的输出如下;事件活动";仅适用于balldrop.times表中列出的DateTime值之间的时间:

事件状态事件活动事件活动
DateTimePress.Well
2021-11-02 02:11:2010事件未激活
2021-11-02 02:11:2110
2021-11-02 02:11:2215
2021-11-02 02:16:0425事件活动
2021-11-02 02:16:0530事件未激活

您的代码无法工作。示例表中的时间与预期输出中的时间也不一致。

tmp <- structure(list(Stage = 6:14, 
BallStart = structure(c(1635837081, 1635847841, 1635856675, 1635866152, 1635878326, 1635886132, 1635895547, 1635902934, 1635911136), tzone = "", class = c("POSIXct", "POSIXt")), 
BallEnd = structure(c(1635837364, 1635848243, 1635857005, 1635866475, 1635878704, 1635886465, 1635895905, 1635903786, 1635911457), tzone = "", class = c("POSIXct", "POSIXt"))
), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L))
tmp1 <- structure(list(DateTime = structure(c(1635825603.6576, 1635825604.608, 1635825605.6448, 1635825606.6816, 1635825607.632), class = c("POSIXct", "POSIXt"), tzone = "GMT"),
Press.Well = c(1154.2561461, 1154.0308849, 1149.7247783, 1152.0544566, 1155.7363779) ), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L))

请注意,这不是一个干净的解决方案。

tmp1 %>% 
mutate(`Event Status` = case_when(
DateTime >= (tmp[1,] %>% pull(BallStart)) & DateTime <= (tmp[1,] %>% pull(BallEnd)) ~ "Event Active",
DateTime >= (tmp[2,] %>% pull(BallStart)) & DateTime <= (tmp[2,] %>% pull(BallEnd)) ~ "Event Active",
DateTime >= (tmp[3,] %>% pull(BallStart)) & DateTime <= (tmp[3,] %>% pull(BallEnd)) ~ "Event Active",
DateTime >= (tmp[4,] %>% pull(BallStart)) & DateTime <= (tmp[4,] %>% pull(BallEnd)) ~ "Event Active",
DateTime >= (tmp[5,] %>% pull(BallStart)) & DateTime <= (tmp[5,] %>% pull(BallEnd)) ~ "Event Active",
TRUE ~ "Event Not Active"
))

因为要比较多个条件,所以case_when是首选选项,而不是ifelse。这样我就可以把它和你的参考表中的每一行进行比较。

现在,就像前面所说的,这不是一个干净的解决方案,因为你有很多行要指定它。有了一个更大的引用表来检查代码,代码将按指数增长。但你可以把它清理成一个函数。

最新更新