想象一个高分辨率的温度和光照时间序列在许多天内在许多地点(站(拍摄。不同的是,在每个站点,温度和光线由不同的传感器获取,从而产生一组略有不同的时间戳。
为了将这些合并为一个data.frame
,我一直在尝试为df.light
中每个站点的每一天制作一个光模型。然后,我想预测温度读数的确切时间戳处的光照值,它们以相同的方式嵌套在df.temp
(温度数据集(中。
station <- rep(1:5, each=36500)
dayofyear <- rep(1:365, 5, each=100)
hourofday.light <- runif(182500, min=0, max=24)
light <- runif(182500, min=0, max=40)
hourofday.temp <- runif(182500, min=0, max=24)
temp <- runif(182500, min=0, max=40)
df.light <- data.frame(station, dayofyear, hourofday.light, light)
df.temp <- data.frame(station, dayofyear, hourofday.temp, temp)
> head(df.light)
station dayofyear hourofday.light light
1 1 1 10.217349 0.120381
2 1 1 12.179213 12.423694
3 1 1 16.515400 7.277784
4 1 1 3.775723 31.793782
5 1 1 7.719266 30.578220
6 1 1 9.269916 16.937042
> tail(df.light)
station dayofyear hourofday.light light
182495 5 365 4.712285 19.2047471
182496 5 365 11.190919 39.5921675
182497 5 365 18.710969 11.8182347
182498 5 365 20.288101 11.6874453
182499 5 365 15.466373 0.3264828
182500 5 365 12.969125 29.4429034
> head(df.temp)
station dayofyear hourofday.temp temp
1 1 1 12.1298554 30.862308
2 1 1 23.6226076 9.328942
3 1 1 9.3699831 28.970397
4 1 1 0.1814767 1.405557
5 1 1 23.6300014 39.875743
6 1 1 7.6999984 39.786182
我可以使用dplyr
在df.light
的每个站点为每天制作灯光模型,例如GAM。但我一直不知道如何将嵌套的newdata
从df.temp
馈送到模型中,以生成每站每天的预测。
library("mgcv")
library("tidyverse")
data <- as_tibble(df.light) %>%
group_by(station, dayofyear) %>%
nest()
models <- data %>%
mutate(
model = map(data, ~ gam(light ~ s(hourofday.light), data = .x)),
predicted = map(model, ~ predict.gam(.x, newdata = hourofday.temp)) # newdata doesn't look nested
)
以predicted
开头的最后一行不起作用,因为newdata没有嵌套。。。我想。请帮忙。我猜这可能是合并多个来源生成的时间序列时的常见问题。
您可以先准备数据。
names(df.temp)[3:4] <- names(df.light)[3:4]
data1 <- df.light %>% group_by(station, dayofyear) %>%nest() %>% ungroup()
data2 <- df.temp %>% group_by(station, dayofyear) %>% nest() %>% ungroup()
应用模型得到预测值。
result <- data1 %>%
mutate(data2 = data2$data,
model = map(data, ~ gam(light ~ s(hourofday.light),data = .x)),
predicted = map2(model, data2, predict.gam))
result
# A tibble: 1,825 x 6
# station dayofyear data data2 model predicted
# <int> <int> <list> <list> <list> <list>
# 1 1 1 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 2 1 2 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 3 1 3 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 4 1 4 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 5 1 5 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 6 1 6 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 7 1 7 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 8 1 8 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# 9 1 9 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
#10 1 10 <tibble [100 × 2]> <tibble [100 × 2]> <gam> <dbl [100]>
# … with 1,815 more rows