根据r中的日期范围合并数据帧



我有两个数据帧,我需要根据ID和日期范围合并。df1有SiteID和date列,df2有SiteID、Start date和End date列。我希望结束一个df,包括SiteID,日期,开始日期和结束日期。

df1例子:

DateTime SiteID
1 2010-07-25 01:06:55    B04
2 2011-05-10 23:52:14    B04
3 2011-09-17 01:14:30    B04
4 2012-04-04 02:55:29    B05
5 2013-01-05 23:03:06    B05
6 2011-03-09 20:39:46    B06
7 2012-07-25 23:17:19    B07
8 2011-03-03 00:46:45    B08

df2例子:

Site.ID Start.date   End.date
1      B04 2010-07-18 2010-08-24
2      B04 2011-02-22 2011-07-23
3      B04 2011-08-30 2012-10-03
4      B04 2012-10-20 2013-04-08
5      B05 2011-08-30 2012-08-21
6      B05 2012-12-08 2013-01-21
7      B05 2013-02-08 2013-04-08
8      B06 2010-07-20 2010-09-03
9      B06 2011-02-12 2011-04-18
10     B06 2011-05-13 2011-05-16
11     B07 2011-10-24 2011-11-29
12     B07 2011-12-29 2012-12-02
13     B08 2011-02-12 2011-04-01
14     B08 2011-10-24 2011-12-24

我想要创建的:

DateTime SiteID Start.date   End.date
1 2010-07-25 01:06:55    B04 2010-07-18 2010-08-24
2 2011-05-10 23:52:14    B04 2011-02-22 2011-07-23
3 2011-09-17 01:14:30    B04 2011-08-30 2012-10-03
4 2012-04-04 02:55:29    B05 2011-08-30 2012-08-21
5 2013-01-05 23:03:06    B05 2012-12-08 2013-01-21
6 2011-03-09 20:39:46    B06 2011-02-12 2011-04-18
7 2012-07-25 23:17:19    B07 2011-12-29 2012-12-02
8 2011-03-03 00:46:45    B08 2011-02-12 2011-04-01

假设您将lubridate用于DateTime,您可以执行

df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>% 
filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#>   DateTime            SiteID Start.date End.date  
#>   <dttm>              <chr>  <date>     <date>    
#> 1 2010-07-25 01:06:55 B04    2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04    2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04    2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05    2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05    2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06    2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07    2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08    2011-02-12 2011-04-01

完整的示例:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

df1 <- tribble(~DateTime, ~SiteID,
"2010-07-25 01:06:55", "B04",
"2011-05-10 23:52:14", "B04",
"2011-09-17 01:14:30", "B04",
"2012-04-04 02:55:29", "B05",
"2013-01-05 23:03:06", "B05",
"2011-03-09 20:39:46", "B06",
"2012-07-25 23:17:19", "B07",
"2011-03-03 00:46:45", "B08") %>% 
mutate(DateTime = as_datetime(DateTime))

df2 <- tribble(~Site.ID, ~Start.date, ~End.date,
"B04", "2010-07-18", "2010-08-24",
"B04", "2011-02-22", "2011-07-23",
"B04", "2011-08-30", "2012-10-03",
"B04", "2012-10-20", "2013-04-08",
"B05", "2011-08-30", "2012-08-21",
"B05", "2012-12-08", "2013-01-21",
"B05", "2013-02-08", "2013-04-08",
"B06", "2010-07-20", "2010-09-03",
"B06", "2011-02-12", "2011-04-18",
"B06", "2011-05-13", "2011-05-16",
"B07", "2011-10-24", "2011-11-29",
"B07", "2011-12-29", "2012-12-02",
"B08", "2011-02-12", "2011-04-01",
"B08", "2011-10-24", "2011-12-24") %>% 
mutate(Start.date = as_date(Start.date), End.date = as_date(End.date))

df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>% 
filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#>   DateTime            SiteID Start.date End.date  
#>   <dttm>              <chr>  <date>     <date>    
#> 1 2010-07-25 01:06:55 B04    2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04    2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04    2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05    2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05    2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06    2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07    2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08    2011-02-12 2011-04-01

由reprex包(v2.0.1)创建于2022-06-09

相关内容

  • 没有找到相关文章

最新更新