r-用嵌套的新数据预测嵌套模型



想象一个高分辨率的温度和光照时间序列在许多天内在许多地点(站(拍摄。不同的是,在每个站点,温度和光线由不同的传感器获取,从而产生一组略有不同的时间戳。

为了将这些合并为一个data.frame,我一直在尝试为df.light中每个站点的每一天制作一个光模型。然后,我想预测温度读数的确切时间戳处的光照值,它们以相同的方式嵌套在df.temp(温度数据集(中。

station <- rep(1:5, each=36500)
dayofyear <- rep(1:365, 5, each=100)
hourofday.light <- runif(182500, min=0, max=24)
light <- runif(182500, min=0, max=40)
hourofday.temp <- runif(182500, min=0, max=24)
temp <- runif(182500, min=0, max=40)
df.light <- data.frame(station, dayofyear, hourofday.light, light)    
df.temp <- data.frame(station, dayofyear, hourofday.temp, temp)
> head(df.light)
station dayofyear hourofday.light     light
1       1         1       10.217349  0.120381
2       1         1       12.179213 12.423694
3       1         1       16.515400  7.277784
4       1         1        3.775723 31.793782
5       1         1        7.719266 30.578220
6       1         1        9.269916 16.937042
> tail(df.light)
station dayofyear hourofday.light      light
182495       5       365        4.712285 19.2047471
182496       5       365       11.190919 39.5921675
182497       5       365       18.710969 11.8182347
182498       5       365       20.288101 11.6874453
182499       5       365       15.466373  0.3264828
182500       5       365       12.969125 29.4429034
> head(df.temp)
station dayofyear hourofday.temp      temp
1       1         1     12.1298554 30.862308
2       1         1     23.6226076  9.328942
3       1         1      9.3699831 28.970397
4       1         1      0.1814767  1.405557
5       1         1     23.6300014 39.875743
6       1         1      7.6999984 39.786182

我可以使用dplyrdf.light的每个站点为每天制作灯光模型,例如GAM。但我一直不知道如何将嵌套的newdatadf.temp馈送到模型中,以生成每站每天的预测。

library("mgcv")
library("tidyverse")
data <- as_tibble(df.light) %>%
group_by(station, dayofyear) %>%
nest()
models <- data %>%
mutate(
model = map(data, ~ gam(light ~ s(hourofday.light), data = .x)),
predicted = map(model, ~ predict.gam(.x, newdata = hourofday.temp)) # newdata doesn't look nested
)

predicted开头的最后一行不起作用,因为newdata没有嵌套。。。我想。请帮忙。我猜这可能是合并多个来源生成的时间序列时的常见问题。

您可以先准备数据。

names(df.temp)[3:4] <- names(df.light)[3:4]
data1 <- df.light %>% group_by(station, dayofyear) %>%nest() %>% ungroup()
data2 <- df.temp %>% group_by(station, dayofyear) %>% nest() %>% ungroup()

应用模型得到预测值。

result <- data1 %>%
mutate(data2 = data2$data,
model = map(data, ~ gam(light ~ s(hourofday.light),data = .x)),
predicted = map2(model, data2, predict.gam))

result
# A tibble: 1,825 x 6
#   station dayofyear data               data2              model  predicted  
#     <int>     <int> <list>             <list>             <list> <list>     
# 1       1         1 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 2       1         2 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 3       1         3 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 4       1         4 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 5       1         5 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 6       1         6 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 7       1         7 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 8       1         8 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# 9       1         9 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
#10       1        10 <tibble [100 × 2]> <tibble [100 × 2]> <gam>  <dbl [100]>
# … with 1,815 more rows

相关内容

最新更新