我有两个数据帧,我需要根据ID和日期范围合并。df1有SiteID和date列,df2有SiteID、Start date和End date列。我希望结束一个df,包括SiteID,日期,开始日期和结束日期。
df1例子:
DateTime SiteID
1 2010-07-25 01:06:55 B04
2 2011-05-10 23:52:14 B04
3 2011-09-17 01:14:30 B04
4 2012-04-04 02:55:29 B05
5 2013-01-05 23:03:06 B05
6 2011-03-09 20:39:46 B06
7 2012-07-25 23:17:19 B07
8 2011-03-03 00:46:45 B08
df2例子:
Site.ID Start.date End.date
1 B04 2010-07-18 2010-08-24
2 B04 2011-02-22 2011-07-23
3 B04 2011-08-30 2012-10-03
4 B04 2012-10-20 2013-04-08
5 B05 2011-08-30 2012-08-21
6 B05 2012-12-08 2013-01-21
7 B05 2013-02-08 2013-04-08
8 B06 2010-07-20 2010-09-03
9 B06 2011-02-12 2011-04-18
10 B06 2011-05-13 2011-05-16
11 B07 2011-10-24 2011-11-29
12 B07 2011-12-29 2012-12-02
13 B08 2011-02-12 2011-04-01
14 B08 2011-10-24 2011-12-24
我想要创建的:
DateTime SiteID Start.date End.date
1 2010-07-25 01:06:55 B04 2010-07-18 2010-08-24
2 2011-05-10 23:52:14 B04 2011-02-22 2011-07-23
3 2011-09-17 01:14:30 B04 2011-08-30 2012-10-03
4 2012-04-04 02:55:29 B05 2011-08-30 2012-08-21
5 2013-01-05 23:03:06 B05 2012-12-08 2013-01-21
6 2011-03-09 20:39:46 B06 2011-02-12 2011-04-18
7 2012-07-25 23:17:19 B07 2011-12-29 2012-12-02
8 2011-03-03 00:46:45 B08 2011-02-12 2011-04-01
假设您将lubridate
用于DateTime
,您可以执行
df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>%
filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#> DateTime SiteID Start.date End.date
#> <dttm> <chr> <date> <date>
#> 1 2010-07-25 01:06:55 B04 2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04 2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04 2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05 2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05 2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06 2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07 2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08 2011-02-12 2011-04-01
完整的示例:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
df1 <- tribble(~DateTime, ~SiteID,
"2010-07-25 01:06:55", "B04",
"2011-05-10 23:52:14", "B04",
"2011-09-17 01:14:30", "B04",
"2012-04-04 02:55:29", "B05",
"2013-01-05 23:03:06", "B05",
"2011-03-09 20:39:46", "B06",
"2012-07-25 23:17:19", "B07",
"2011-03-03 00:46:45", "B08") %>%
mutate(DateTime = as_datetime(DateTime))
df2 <- tribble(~Site.ID, ~Start.date, ~End.date,
"B04", "2010-07-18", "2010-08-24",
"B04", "2011-02-22", "2011-07-23",
"B04", "2011-08-30", "2012-10-03",
"B04", "2012-10-20", "2013-04-08",
"B05", "2011-08-30", "2012-08-21",
"B05", "2012-12-08", "2013-01-21",
"B05", "2013-02-08", "2013-04-08",
"B06", "2010-07-20", "2010-09-03",
"B06", "2011-02-12", "2011-04-18",
"B06", "2011-05-13", "2011-05-16",
"B07", "2011-10-24", "2011-11-29",
"B07", "2011-12-29", "2012-12-02",
"B08", "2011-02-12", "2011-04-01",
"B08", "2011-10-24", "2011-12-24") %>%
mutate(Start.date = as_date(Start.date), End.date = as_date(End.date))
df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>%
filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#> DateTime SiteID Start.date End.date
#> <dttm> <chr> <date> <date>
#> 1 2010-07-25 01:06:55 B04 2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04 2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04 2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05 2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05 2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06 2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07 2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08 2011-02-12 2011-04-01
由reprex包(v2.0.1)创建于2022-06-09