r语言 - 使用bam()与family = betar的GAM错误



我在解决从mgcv运行bam()时遇到的错误时遇到了麻烦。

我注意到14个月前这里报告了一个类似的错误,似乎没有达成一致的解决方案-建议给Simon Wood发邮件。

我的数据在这里。数据集太大,无法粘贴dput()的输出

如果我使用整个数据集运行下面的模型,我会得到以下错误

library(mgcv)
m3 <- bam(pt10 ~ 
org.type +
region +
s(year) + 
s(year, by = org.type) +
s(year, by = region), 
data = error, 
method = "fREML", 
family = betar(link="logit", eps = 0.1),
select = T)
Warning messages:
1: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1,  :
step failure in theta estimation
2: In wt * LS :
longer object length is not a multiple of shorter object length
3: In muth * (log(y) - log1p(-y)) :
longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
longer object length is not a multiple of shorter object length
6: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
longer object length is not a multiple of shorter object length
7: In prior. weights * y :
longer object length is not a multiple of shorter object length
8: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
longer object length is not a multiple of shorter object length
但是,如果我使用整个数据集运行相同的模型,但是如果排除最后一行,模型似乎运行ok
m3 <- bam(pt10 ~ 
org.type +
region +
s(year) + 
s(year, by = org.type) +
s(year, by = region), 
data = error[1:20500,], 
method = "fREML", 
family = betar(link="logit", eps = 0.1),
select = T)

这让我觉得数据集的最后一行有问题。但是,在数据集的最后一行中,我看不到任何错误,我希望这些错误会产生上述警告消息。

如果我在数据的一个小子集上再次运行相同的模型,但这次包括最后一行数据,模型似乎再次运行正常。

m3 <- bam(pt10 ~ 
org.type +
region +
s(year) + 
s(year, by = org.type) +
s(year, by = region), 
data = error[20400:20501,], 
method = "fREML", 
family = betar(link="logit", eps = 0.1),
select = T)

但是更大的数据子集,同样包括最后一行,会产生与上面类似的警告消息。

m3 <- bam(pt10 ~ 
org.type +
region +
s(year) + 
s(year, by = org.type) +
s(year, by = region), 
data = error[10000:20501,], 
method = "fREML", 
family = betar(link="logit", eps = 0.1),
select = T)
Warning messages:
1: In wt * LS :
longer object length is not a multiple of shorter object length
2: In muth * (log(y) - log1p(-y)) :
longer object length is not a multiple of shorter object length
3: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
longer object length is not a multiple of shorter object length
6: In prior.weights * y :
longer object length is not a multiple of shorter object length
7: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
longer object length is not a multiple of shorter object length
8: In bgam.fit(G, mf, chunk.size, gp, scale, gamma, method = method,  :
algorithm did not converge

欢迎指教。

我怀疑问题出在你的eps(这可能表明你的数据有问题)。

默认值:

r$> .Machine$double.eps*100                                                     
[1] 2.220446e-14

所以你截断所有的响应值到间隔[eps, 1-eps](即任何y < epsy > 1-eps分别被重置为eps1 - eps)。我想这导致了拟合算法的问题,并且它遇到了没有预料到的情况。如果在[eps, 1-eps]范围之外有相当数量的值,你将把所有这些值堆积在范围的极限上,我怀疑这会导致数据的细微变化导致拟合算法中的数值问题。

截断你正在做的数据表明这不是你的数据的正确分布。在没有其他资料的情况下,我将另寻更合适的方法。

最新更新