如果我有一个开始日期和结束日期的时间范围,我可以很容易地确定特定日期是否在这个时间范围内。我们如何确定一个特定的月/日组合是否在一个时间范围内,与年份无关。
示例
我想知道七月一日(07-01
(是否在一个时间范围内。
2020-01-30 - 2020-06-15 --> NO
2020-06-16 - 2021-03-20 --> YES
2013-04-26 - 2019-02-13 --> YES (multiple)
R代码示例
# set seed for sampling
set.seed(1)
# number of time ranges
cases <- 10
# time gaps in days
gaps <- sort(sample(x = 1:5000, size = cases, replace = TRUE))
# data frame with time ranges
df <- data.frame(dates_start = rev(Sys.Date() - gaps[2:cases] + 1),
dates_end = rev(Sys.Date() - gaps[1:(cases-1)]))
df
#> dates_start dates_end
#> 1 2009-06-26 2010-01-19
#> 2 2010-01-20 2011-06-05
#> 3 2011-06-06 2011-06-20
#> 4 2011-06-21 2013-04-21
#> 5 2013-04-22 2016-02-17
#> 6 2016-02-18 2016-08-05
#> 7 2016-08-06 2018-05-11
#> 8 2018-05-12 2019-10-09
#> 9 2019-10-10 2021-10-25
# Is specific date in date range
df$date_in_range <- df$dates_start <= lubridate::ymd("2019-07-01") &
lubridate::ymd("2019-07-01") < df$dates_end
# specific day of a month in date range
# pseudo code
data.table::between(x = month_day("07-01"),
lower = dates_start,
upper = dates_end)
#> Error in month_day("07-01"): could not find function "month_day"
# expected output
df$monthday_in_range <- c(T, T, F, T, T, T, T, T, T)
df
#> dates_start dates_end date_in_range monthday_in_range
#> 1 2009-06-26 2010-01-19 FALSE TRUE
#> 2 2010-01-20 2011-06-05 FALSE TRUE
#> 3 2011-06-06 2011-06-20 FALSE FALSE
#> 4 2011-06-21 2013-04-21 FALSE TRUE
#> 5 2013-04-22 2016-02-17 FALSE TRUE
#> 6 2016-02-18 2016-08-05 FALSE TRUE
#> 7 2016-08-06 2018-05-11 FALSE TRUE
#> 8 2018-05-12 2019-10-09 TRUE TRUE
#> 9 2019-10-10 2021-10-25 FALSE TRUE
更新2
dplyr/data.table独立函数
md_in_interval <- function(md, start, end) {
# does the interval cover more than a full year?
# Then any date will fall in this interval and hence the result is TRUE
helper <- (lubridate::year(end) - lubridate::year(start)) > 1
# lubridate time interval
interval <- lubridate::interval(dates_start, dates_end)
# helper dates with month/day combination and start year
my_date1 <- lubridate::mdy(paste0(md, lubridate::year(start)))
# helper dates with month/day combination and end year
my_date2 <- lubridate::mdy(paste0(md, lubridate::year(end)))
# check if month/day combination falls within the interval
out <- my_date1 %within% interval |
my_date2 %within% interval |
helper
return(out)
}
与数据一起使用。table
library(data.table)
dt <- data.table::as.data.table(df)
dt[, isin := md_in_interval("06-05", dates_start, dates_end)][]
更新
为了解决跨度超过一年的问题,我们可以使用一个辅助列:
df %>%
mutate(across(, ymd),
helper = ifelse(year(dates_end) - year(dates_start) > 1, 1, 0),
interval = interval(dates_start, dates_end)) %>%
mutate(my_date1 = mdy(paste0("07-01-",year(dates_start))),
my_date2 = mdy(paste0("07-01-",year(dates_end)))) %>%
mutate(check = my_date1 %within% interval | my_date2 %within% interval | helper == 1) %>%
select(1,2,7)
dates_start dates_end check
1 2009-06-26 2010-01-19 TRUE
2 2010-01-20 2011-06-05 TRUE
3 2011-06-06 2011-06-20 FALSE
4 2011-06-21 2013-04-21 TRUE
5 2013-04-22 2016-02-17 TRUE
6 2016-02-18 2016-08-05 TRUE
7 2016-08-06 2018-05-11 TRUE
8 2018-05-12 2019-10-09 TRUE
9 2019-10-10 2021-10-25 TRUE
第一个答案:
我们可以使用lubridate
。
我们用
interval
创建一个区间,然后我们我们用%以内的%来检查一天是否在间隔中。
在此之前,我们必须创建一个07-01元素的年月日。我们使用
mdy(paste0("07-01-",year(dates_start)))
library(dplyr)
library(lubridate)
df %>%
mutate(across(, ymd),
interval = interval(dates_start, dates_end)) %>%
mutate(my_date = mdy(paste0("07-01-",year(dates_start)))) %>%
mutate(check = my_date %within% interval)
dates_start dates_end interval my_date check
1 2009-06-26 2010-01-19 2009-06-26 UTC--2010-01-19 UTC 2009-07-01 TRUE
2 2010-01-20 2011-06-05 2010-01-20 UTC--2011-06-05 UTC 2010-07-01 TRUE
3 2011-06-06 2011-06-20 2011-06-06 UTC--2011-06-20 UTC 2011-07-01 FALSE
4 2011-06-21 2013-04-21 2011-06-21 UTC--2013-04-21 UTC 2011-07-01 TRUE
5 2013-04-22 2016-02-17 2013-04-22 UTC--2016-02-17 UTC 2013-07-01 TRUE
6 2016-02-18 2016-08-05 2016-02-18 UTC--2016-08-05 UTC 2016-07-01 TRUE
7 2016-08-06 2018-05-11 2016-08-06 UTC--2018-05-11 UTC 2016-07-01 FALSE
8 2018-05-12 2019-10-09 2018-05-12 UTC--2019-10-09 UTC 2018-07-01 TRUE
9 2019-10-10 2021-10-25 2019-10-10 UTC--2021-10-25 UTC 2019-07-01 FALSE
您可以尝试
library(lubridate)
library(dplyr)
df %>%
rowwise %>%
mutate(monthday_in_range = 7 %in% month(seq(floor_date(dates_start, "month"), dates_end, by = "month")))
dates_start dates_end monthday_in_range
<date> <date> <lgl>
1 2009-06-26 2010-01-19 TRUE
2 2010-01-20 2011-06-05 TRUE
3 2011-06-06 2011-06-20 FALSE
4 2011-06-21 2013-04-21 TRUE
5 2013-04-22 2016-02-17 TRUE
6 2016-02-18 2016-08-05 TRUE
7 2016-08-06 2018-05-11 TRUE
8 2018-05-12 2019-10-09 TRUE
9 2019-10-10 2021-10-25 TRUE
添加
df %>%
rowwise %>%
mutate(monthday_in_range = 7 %in% month(seq(ymd(paste0(substr(dates_start, 1, 8), "13")), dates_end, by = "month")))