检查r中两个独立数据帧的日期和时间是否重叠



在包含交通阻塞的数据帧中,提供了开始日期、结束日期、开始时间、结束时间、道路、方向(右或左)、开始路段和结束路段。一个示例如下:

start date    end date    start time    end time    road    direction   start road section    end road section
2015-03-02    2015-03-02     07:29        09:45      A1        R               113.1               125.7
2017-05-04    2017-05-05     23:29        01:45      A10       L               12.5                14.3

另一个大型数据帧(2000多万行)包含2015年至2020年期间所有可能的路段(以20公里为单位)和时间(以小时为单位)组合。一个示例如下:

date         time         road      direction      road section
2015-03-02  09:00-10:00   A1          R               100 - 120
2015-03-02  10:00-11:00   A1          R               100 - 120
2015-03-02  09:00-10:00   A1          R               80 - 100

我想在第二个数据帧中添加一个二进制变量,表示在某一路段当时是否有交通堵塞。在本例中,第一行的值为1,因为时间和路段范围都与第一个数据帧的第一行中的交通堵塞重叠。然而,第二行和第三行取值为0,因为时间和路段不重叠。

任何帮助和/或建议都是非常感谢的!

事实证明,这是一项相当艰巨的任务,其解决方案涉及相当一些工作。最后,这是一个美好的周日下午的锻炼。让我们希望它适用于您的完整数据集。

ex <- data.frame(start_date = c("2015-03-02", "2017-05-04"),
end_date = c("2015-03-02", "2017-05-05"), 
start_time = c("07:29", "23:29"), 
end_time = c("09:45", "01:45"), 
road = c("A1", "A10"), 
direction = c("R", "L"),
start_road_section = c(113.1, 12.5), 
end_road_section = c(125.7, 14.3)) 
df <- data.frame(date = c("2015-03-02", "2015-03-02", "2015-03-02"),
time = c("09:00-10:00", "10:00-11:00", "09:00-10:00"),
road = c("A1", "A1", "A1"), 
direction = c("R", "R", "R"), 
road_section = c("100 - 120", "100 - 120", "80 - 100"))

第一步:清理数据

df %<>% mutate(time_component = strsplit(as.character(time), "-")) %>% 
tidyr::unnest_wider(time_component, names_sep = "_")   %>%        
select(-time) %>% 
mutate(road_component = strsplit(as.character(road_section), " - ")) %>% 
tidyr::unnest_wider(road_component, names_sep = "_") %>% 
select(-road_section)
> df
# A tibble: 3 x 7
date       road  direction time_component_1 time_component_2 road_component_1 road_component_2
<chr>      <chr> <chr>     <chr>            <chr>            <chr>            <chr>           
1 2015-03-02 A1    R         09:00            10:00            100              120             
2 2015-03-02 A1    R         10:00            11:00            100              120             
3 2015-03-02 A1    R         09:00            10:00            80               100   

第二步:合并数据

comb_df <- dplyr::inner_join(ex,df, by=c("start_date" = "date", "road", "direction"))
comb_df$start_road_section <- as.numeric(comb_df$start_road_section) 
comb_df$end_road_section <- as.numeric(comb_df$end_road_section) 
comb_df$road_component_1 <- as.numeric(comb_df$road_component_1) 
comb_df$road_component_2 <- as.numeric(comb_df$road_component_2)

第三步:寻找时间与路段的重叠

new_df <- comb_df %>% mutate(ex_start = lubridate::ymd_hm(paste(start_date, start_time, sep = " ")),
ex_end = lubridate::ymd_hm(paste(end_date, end_time, sep = " ")), 
df_start = lubridate::ymd_hm(paste(start_date, time_component_1, sep = " ")),
df_end = lubridate::ymd_hm(paste(start_date, time_component_2, sep = " ")),
Intervall_ex = lubridate::interval(start = ex_start, end = ex_end),
Intervall_df = lubridate::interval(start = df_start, end = df_end), 
Time_Contained = ifelse(int_overlaps(Intervall_ex, Intervall_df), 1, 0), 
Road_Contained = ifelse(((start_road_section < road_component_1) & (end_road_section > road_component_1)) | 
((start_road_section > road_component_1) & (start_road_section < road_component_2)), 
1, 
0), 
Final_Result = ifelse(Time_Contained + Road_Contained == 2, 1, 0))

第4步也是最后一步:减少数据集

final_df <- new_df %>% select(start_date, end_date, road, 
direction, road_component_1, road_component_2, 
time_component_1, time_component_2, Final_Result)
> final_df
start_date   end_date road direction road_component_1 road_component_2 time_component_1 time_component_2 Final_Result
1 2015-03-02 2015-03-02   A1         R              100              120            09:00            10:00            1
2 2015-03-02 2015-03-02   A1         R              100              120            10:00            11:00            0
3 2015-03-02 2015-03-02   A1         R               80              100            09:00            10:00            0

最新更新