我在解决从mgcv
运行bam()
时遇到的错误时遇到了麻烦。
我注意到14个月前这里报告了一个类似的错误,似乎没有达成一致的解决方案-建议给Simon Wood发邮件。
我的数据在这里。数据集太大,无法粘贴dput()
的输出
如果我使用整个数据集运行下面的模型,我会得到以下错误
library(mgcv)
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error,
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
Warning messages:
1: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1, :
step failure in theta estimation
2: In wt * LS :
longer object length is not a multiple of shorter object length
3: In muth * (log(y) - log1p(-y)) :
longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
6: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
7: In prior. weights * y :
longer object length is not a multiple of shorter object length
8: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - :
longer object length is not a multiple of shorter object length
但是,如果我使用整个数据集运行相同的模型,但是如果排除最后一行,模型似乎运行ok
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error[1:20500,],
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
这让我觉得数据集的最后一行有问题。但是,在数据集的最后一行中,我看不到任何错误,我希望这些错误会产生上述警告消息。
如果我在数据的一个小子集上再次运行相同的模型,但这次包括最后一行数据,模型似乎再次运行正常。
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error[20400:20501,],
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
但是更大的数据子集,同样包括最后一行,会产生与上面类似的警告消息。
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error[10000:20501,],
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
Warning messages:
1: In wt * LS :
longer object length is not a multiple of shorter object length
2: In muth * (log(y) - log1p(-y)) :
longer object length is not a multiple of shorter object length
3: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
6: In prior.weights * y :
longer object length is not a multiple of shorter object length
7: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - :
longer object length is not a multiple of shorter object length
8: In bgam.fit(G, mf, chunk.size, gp, scale, gamma, method = method, :
algorithm did not converge
欢迎指教。
我怀疑问题出在你的eps
(这可能表明你的数据有问题)。
默认值:
r$> .Machine$double.eps*100
[1] 2.220446e-14
所以你截断所有的响应值到间隔[eps, 1-eps]
(即任何y < eps
或y > 1-eps
分别被重置为eps
和1 - eps
)。我想这导致了拟合算法的问题,并且它遇到了没有预料到的情况。如果在[eps, 1-eps]
范围之外有相当数量的值,你将把所有这些值堆积在范围的极限上,我怀疑这会导致数据的细微变化导致拟合算法中的数值问题。
截断你正在做的数据表明这不是你的数据的正确分布。在没有其他资料的情况下,我将另寻更合适的方法。