r语言 - 运行GEE logistic模型错误:NA/NaN/Inf外部函数调用(arg 2).



我正在运行通过广义估计方程(GEEs)实现的逻辑回归模型,尽管尝试了在SO和其他地方发布的多个解决方案,但仍然遇到以下错误。我不确定这个错误是从哪里产生的。我正在使用gee包,但错误也发生在geepack中。

有没有人知道为什么这个错误可能会发生,尽管没有NA,inf,或字符变量在数据集中?我怀疑我遗漏了一些非常简单的东西,但两天后,我不得不把它扔给比我更好的程序员。

下面是重现错误的最小数据和代码,解决方案的尝试,以及相关的SO问题。


数据
df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L,  24L, 21L, 19L, 5L, 4L, 18L, 
13L, 23L, 16L, 25L, 12L, 10L, 9L,  22L, 17L, 11L, 3L, 2L, 2L), 
levels = c("ALWA28M", "BOMA13M", "BOMA41M",  "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M", 
"FASI6M", "FRRO35M",  "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM", 
"MAAD60M",  "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",  
"STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),      
testres = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), 
levels = c("POS", "NEG"), class = "factor"), 
agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L, 
3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L), 
levels = c("0", "1", "2", "3", "4", "5"), class = "factor")), 
row.names = c(NA,  26L), 
class = "data.frame")

模型
gee::gee(testres ~ agegrp, data = df, 
id = id, 
family = binomial, 
corstr = "exchangeable")

误差

查询gee::gee(testres ~ agegrp, data = df, id = id, family = binomial,:NA/NaN/Inf外部函数调用(参数2)警告信息:在gee::gee(testres ~ agegrp, data = df, id = id, family =二项式)中:强制引入的NAs

检查数据以确保没有NA,Inf或字符变量-所有因素都没有丢失数据

# All factors
str(df)
# 'data.frame':   26 obs. of  3 variables:
# $ id     : Factor w/ 25 levels "ALWA28M","BOMA13M",..: 7 1 20 15 14 6 8 24 21 19 ...
# $ testres: Factor w/ 2 levels "POS","NEG": 1 1 1 2 1 1 1 1 1 1 ...
# $ agegrp : Factor w/ 6 levels "0","1","2","3",..: 5 3 3 5 1 1 2 2 1 2 ...
# No NAs or Infinites
lapply(df, table, useNA = "always")
# 0 NAs
lapply(df, (x) table(is.infinite(x)))
# All FALSE

使用geepack的替代方法

geepack::geeglm(testres ~ agegrp,
data = df, id = id,
corstr = "exchangeable",
family = "binomial")

geepackerror:

lm错误。fit(zsca, qlf(pr2), offset = offset): NA/NaN/Inf 'y'警告信息:1:模型。回应(mf, "数字"):使用type = "numeric"带有因子的响应将被忽略第2集:在Ops。因子(y, mu): ' - '对于因子

没有意义

改变相关结构会产生相同的错误。标准逻辑回归收敛:

summary(glm(testres ~ agegrp, data = df, family = "binomial"(link = logit)))

没有解决问题的问题。虽然这个问题在网站上很常见,但在我看来,在SO上没有足够的答案来回答这个问题,因此决定张贴。

  1. 如何消除';NA/NaN/Inf外来函数调用(arg 7) ';使用randomForest
  2. 运行预测
  3. R: NA/NaN/Inf外部函数调用(arg 1)
  4. 使用gee()拟合模型错误:NA/NaN/Inf外部函数调用(arg 3)
  5. NA/NaN/Inf外部函数调用(arg 2)
  6. NA/NaN/Inf外部函数调用(arg 5)
  7. lme: NA/NaN/Inf外部函数调用(arg 3)
  8. 当试图运行PGLS (Pagel的lambda)时,NA/NaN/Inf外部函数调用(arg 1)
  9. 如何在bigglm中消除"NA/NaN/Inf外部函数调用(arg 3)">
  10. glmnet中的R错误:NA/NaN/Inf外部函数调用

testresworks中使用0和1:

df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L,  24L, 21L, 19L, 5L, 4L, 18L,
13L, 23L, 16L, 25L, 12L, 10L, 9L,  22L, 17L, 11L, 3L, 2L, 2L),
levels = c("ALWA28M", "BOMA13M", "BOMA41M",  "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M",
"FASI6M", "FRRO35M",  "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM",
"MAAD60M",  "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",
"STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),
testres = structure(c(1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L)),
agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L,
3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L),
levels = c("0", "1", "2", "3", "4", "5"), class = "factor")),
row.names = c(NA,  26L),
class = "data.frame")
gee::gee(testres ~ agegrp, data = df,
id = id,
family = binomial,
corstr = "exchangeable")
#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#>   (Intercept)       agegrp1       agegrp2       agegrp3       agegrp4 
#>  1.956607e+01 -3.377525e-08 -1.817977e+01 -1.831331e+01 -1.887292e+01 
#>       agegrp5 
#> -3.513736e-08
#> Error in gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : Cgee: error: logistic model for probability has fitted value very close to 1.
#> estimates diverging; iteration terminated.

现在有一个错误,因为模型已经拟合了一些概率非常接近0或1,但我认为这是一个无关的问题(见?glm的详细信息部分)。

最新更新