这可能是一个幼稚的问题,但没有找到解决方案。我有一个数据框架,里面有来自实地调查的计数数据,我想用泊松回归来预测物种丰富度。调查被分配到大小相等的网格,但在每个网格中进行了不同数量的调查。所以我想包括"每个网格的调查数量"作为偏移量。问题是,当我想使用光栅堆栈预测glm输出时,它需要偏移变量(每个网格的调查数量(的光栅层。我的问题是如何将偏移量变量合并到光栅堆栈中,以便生成空间预测(即,预测应该是光栅文件(。以下是我的可重复努力(使用较少的变量(:
创建数据帧:
bio2 <- c(12.74220, 14.10092, 13.82644, 14.30550, 15.02780, 14.88224, 13.98853, 14.89524, 15.59887, 13.98664, 14.75405,
15.38178, 14.50719, 15.00427, 12.77741, 13.25432, 12.91208, 15.75312, 15.36683, 13.33202, 12.55190, 14.94755,
13.52424, 14.75273, 14.42298, 15.37897, 12.02472, 15.49786, 14.28823, 13.01982, 13.60521, 15.07687, 14.17427,
13.24491, 14.84833, 13.52594, 13.92113, 11.39738, 14.31446, 12.10239)
bio9 <- c(26.30980, 26.52826, 27.03376, 23.93621, 26.48416, 26.05859, 25.37550, 25.34595, 25.34056, 23.37793, 25.74681,
22.72016, 22.00458, 24.37140, 22.95169, 24.52542, 24.63087, 22.86291, 23.10240, 23.79215, 24.86875, 21.40718,
23.84258, 21.91964, 25.97682, 24.97625, 22.31471, 19.64094, 23.93386, 25.87234, 25.99514, 17.17149, 20.72802,
18.22862, 24.51112, 24.33626, 23.90822, 23.43660, 23.07425, 20.71244)
count <- c(37, 144, 91, 69, 36, 32, 14, 34, 48, 168, 15, 21, 36, 29, 24, 16, 14, 11, 18, 64, 37, 31, 18, 9, 4,
16, 14, 10, 14, 43, 18, 88, 69, 26, 20, 5, 9, 75, 8, 26)
sitesPerGrid <- c(3, 16, 8, 5, 3, 3, 1, 3, 3, 29, 2, 4, 5, 2, 3, 4, 2, 1, 2, 9, 6, 3, 3, 2, 1, 2, 2, 1, 2, 5, 7, 15, 9, 4,
1, 1, 2, 22, 6, 5)
testdf <- data.frame(bio2, bio9, count, sitesPerGrid)
pois1 <- glm(count ~ bio2 + bio9, offset = log(sitesPerGrid), family = poisson (link = "log"), data = testdf)
空间预测:
library(raster)
bio_2 <- bio_9 <- raster(nrow=5,ncol=8, xmn=0, xmx=1,ymn=0,ymx=1)
values(bio_2) <- bio2
values(bio_9) <- bio9
predRas <- stack(bio_2, bio_9)
names(predRas) <- c("bio2", "bio9")
pdPois <- raster::predict(predRas, pois1, type = "response")
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = #object$xlevels) :
# variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 16 rows but variables found have 40 rows
我得到error
,因为它需要sitesPerGrid
的光栅层。但我不想用sitesPerGrid
作为预测因子。
更新
根据@robertHijmans给出的评论和答案,我尝试使用以下代码:
pdPois <- raster::predict(predRas, pois1, const = testdf[, "sitesPerGrid"], type = "response")
我再次得到以下错误:
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 143811, 40
我看到这是有效的,因为数据点的数量与用于拟合模型的数量相同
p <- predict(pois1, as.data.frame(predRas), type = "response")
然而,这(取两个数据点(不起作用:
p <- predict(pois1, as.data.frame(predRas)[1:2,], type = "response")
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
# variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 2 rows but variables found have 40 rows
那么,不管光栅数据如何,你能(如果是,如何?(使用这样的模型来预测(任何数量的(新数据点吗?
使用偏移量变量的光栅解决了问题。光栅是基于一个假设创建的。例如,如果每个网格有一个站点,或者mean(sitesPerGrid)
或max(sitesPerGrid)
,我想查看预测。如果我的假设是mean(sitesPerGrid)
,那么用于预测的光栅将是:
# make new raster for sitesPerGrid
rasGrid <- bio2
rasGrid[,] <- mean(testdf$sitesPerGrid)
names(rasGrid) <- "sitesPerGrid"
predRas <- stack(bio_2, bio_9, rasGrid)
p <- raster::predict(predRas, pois1, type = "response")