r-将偏移变量合并到光栅中用于预测泊松回归的最佳方法



这可能是一个幼稚的问题,但没有找到解决方案。我有一个数据框架,里面有来自实地调查的计数数据,我想用泊松回归来预测物种丰富度。调查被分配到大小相等的网格,但在每个网格中进行了不同数量的调查。所以我想包括"每个网格的调查数量"作为偏移量。问题是,当我想使用光栅堆栈预测glm输出时,它需要偏移变量(每个网格的调查数量(的光栅层。我的问题是如何将偏移量变量合并到光栅堆栈中,以便生成空间预测(即,预测应该是光栅文件(。以下是我的可重复努力(使用较少的变量(:

创建数据帧:

bio2 <- c(12.74220, 14.10092, 13.82644, 14.30550, 15.02780, 14.88224, 13.98853, 14.89524, 15.59887, 13.98664, 14.75405,
15.38178, 14.50719, 15.00427, 12.77741, 13.25432, 12.91208, 15.75312, 15.36683, 13.33202, 12.55190, 14.94755,
13.52424, 14.75273, 14.42298, 15.37897, 12.02472, 15.49786, 14.28823, 13.01982, 13.60521, 15.07687, 14.17427,
13.24491, 14.84833, 13.52594, 13.92113, 11.39738, 14.31446, 12.10239)
bio9 <- c(26.30980, 26.52826, 27.03376, 23.93621, 26.48416, 26.05859, 25.37550, 25.34595, 25.34056, 23.37793, 25.74681,
22.72016, 22.00458, 24.37140, 22.95169, 24.52542, 24.63087, 22.86291, 23.10240, 23.79215, 24.86875, 21.40718,
23.84258, 21.91964, 25.97682, 24.97625, 22.31471, 19.64094, 23.93386, 25.87234, 25.99514, 17.17149, 20.72802,
18.22862, 24.51112, 24.33626, 23.90822, 23.43660, 23.07425, 20.71244)
count <- c(37, 144,  91,  69,  36,  32,  14,  34,  48, 168,  15,  21,  36,  29,  24,  16,  14,  11,  18,  64,  37,  31,  18,   9,   4,
16,  14,  10,  14,  43,  18,  88,  69,  26,  20,   5,   9,  75,   8,  26)
sitesPerGrid <- c(3, 16, 8,  5,  3,  3,  1,  3,  3, 29,  2,  4,  5,  2,  3,  4,  2,  1,  2,  9,  6,  3,  3,  2,  1,  2,  2,  1,  2,  5,  7, 15,  9,  4,
1,  1,  2, 22,  6,  5)
testdf <- data.frame(bio2, bio9, count, sitesPerGrid)
pois1 <- glm(count ~ bio2 + bio9, offset = log(sitesPerGrid), family = poisson (link = "log"), data = testdf)

空间预测:

library(raster)
bio_2 <- bio_9 <- raster(nrow=5,ncol=8, xmn=0, xmx=1,ymn=0,ymx=1)
values(bio_2) <- bio2
values(bio_9) <- bio9
predRas <- stack(bio_2, bio_9)
names(predRas) <- c("bio2", "bio9")
pdPois <- raster::predict(predRas, pois1, type = "response")
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = #object$xlevels) : 
#  variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 16 rows but variables found have 40 rows 


我得到error,因为它需要sitesPerGrid的光栅层。但我不想用sitesPerGrid作为预测因子。

更新

根据@robertHijmans给出的评论和答案,我尝试使用以下代码:

pdPois <- raster::predict(predRas, pois1, const = testdf[, "sitesPerGrid"], type = "response")

我再次得到以下错误:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 143811, 40

我看到这是有效的,因为数据点的数量与用于拟合模型的数量相同

p <- predict(pois1, as.data.frame(predRas), type = "response")

然而,这(取两个数据点(不起作用:

p <- predict(pois1, as.data.frame(predRas)[1:2,], type = "response")
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
#  variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 2 rows but variables found have 40 rows 

那么,不管光栅数据如何,你能(如果是,如何?(使用这样的模型来预测(任何数量的(新数据点吗?

使用偏移量变量的光栅解决了问题。光栅是基于一个假设创建的。例如,如果每个网格有一个站点,或者mean(sitesPerGrid)max(sitesPerGrid),我想查看预测。如果我的假设是mean(sitesPerGrid),那么用于预测的光栅将是:

# make new raster for sitesPerGrid
rasGrid <- bio2
rasGrid[,] <- mean(testdf$sitesPerGrid)
names(rasGrid) <- "sitesPerGrid"
predRas <- stack(bio_2, bio_9, rasGrid)
p <- raster::predict(predRas, pois1, type = "response")

相关内容

  • 没有找到相关文章

最新更新