r-使用日出和日落时间过滤时间序列数据



我正在尝试对日落和日出之间的数据进行子集处理。数据:

library(tidyverse)
library(lubridate)
library(suncalc)
dat <- tibble(datetime = seq(as.POSIXct('2020-08-03 00:00:00'), 
as.POSIXct('2020-08-09 12:00:00'), 
by=3600),
var1 = rnorm(157,2,1),
var2 = rnorm(157,3,5)) %>% 
mutate(getSunlightTimes(date = as.Date(datetime, format = '%m/%d/%Y'),
lat = 43.1, lon = -76.2, tz = 'America/New_York',
keep = c('sunrise', 'sunset'))) %>% 
select(c(datetime, var1, var2, sunrise, sunset)) 

然后,我想对数据进行子集设置,以便只保留datetime位于给定日期日出和日落之间的行。我试过了:

myrange <- as.interval(unique(dat$sunrise), unique(dat$sunset))
dat <- dat %>% 
filter(datetime %within% myrange)

这会编译但抛出警告,并且不包括它应该包含的所有数据行。提前谢谢。

试试这个:

首先,创建部分dat。我将添加date,因为我们需要它来进行计算和将数据连接回

set.seed(42)
dat <- tibble(datetime = seq(as.POSIXct('2020-08-03 00:00:00'), 
as.POSIXct('2020-08-09 12:00:00'), 
by=3600),
var1 = rnorm(157,2,1),
var2 = rnorm(157,3,5)) %>%
mutate(date = as.Date(datetime))
dat
# # A tibble: 157 x 4
#    datetime             var1   var2 date      
#    <dttm>              <dbl>  <dbl> <date>    
#  1 2020-08-03 00:00:00  3.37 -1.00  2020-08-03
#  2 2020-08-03 01:00:00  1.44  0.333 2020-08-03
#  3 2020-08-03 02:00:00  2.36  9.44  2020-08-03
#  4 2020-08-03 03:00:00  2.63  2.12  2020-08-03
#  5 2020-08-03 04:00:00  2.40 -2.36  2020-08-03
#  6 2020-08-03 05:00:00  1.89  3.82  2020-08-03
#  7 2020-08-03 06:00:00  3.51  1.19  2020-08-03
#  8 2020-08-03 07:00:00  1.91  5.95  2020-08-03
#  9 2020-08-03 08:00:00  4.02 10.2   2020-08-03
# 10 2020-08-03 09:00:00  1.94 -1.96  2020-08-03
# # ... with 147 more rows

现在,将其减少到找到的不同的date,获得这些日期的日出/日落,并加入原始的dat(使用left_join(。之后,我们可以根据需要filter

distinct(dat, date) %>%
with(., getSunlightTimes(date = date,
lat = 43.1, lon = -76.2, tz = 'America/New_York',
keep = c('sunrise', 'sunset'))) %>% 
left_join(dat, ., by = "date") %>%
filter(sunrise <= datetime, datetime <= sunset)
# # A tibble: 85 x 8
#    datetime             var1  var2 date         lat   lon sunrise             sunset             
#    <dttm>              <dbl> <dbl> <date>     <dbl> <dbl> <dttm>              <dttm>             
#  1 2020-08-03 06:00:00 3.51   1.19 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  2 2020-08-03 07:00:00 1.91   5.95 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  3 2020-08-03 08:00:00 4.02  10.2  2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  4 2020-08-03 09:00:00 1.94  -1.96 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  5 2020-08-03 10:00:00 3.30   5.27 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  6 2020-08-03 11:00:00 4.29   3.42 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  7 2020-08-03 12:00:00 0.611  7.48 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  8 2020-08-03 13:00:00 1.72   1.85 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
#  9 2020-08-03 14:00:00 1.87   7.18 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
# 10 2020-08-03 15:00:00 2.64  -5.73 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
# # ... with 75 more rows

请注意,我们不能在这里使用dplyr::between,因为该函数只使用其leftright参数中的第一个。

最新更新