R因子的Logistic回归误差

  • 本文关键字:回归 误差 Logistic r dplyr
  • 更新时间 :
  • 英文 :


我正在尝试使用以下代码进行逻辑回归:

model <- glm (Participation ~ Gender + Race + Ethnicity + Education + Comorbidities + WLProgram + LoseWeight + EverLoseWeight + PastYearLW + Age + BMI, data = LogisticData, family = binomial)

摘要(模型(

我一直收到错误:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :  contrasts can be applied only to factors with 2 or more levels

在查看论坛后,我查看了哪些变量是因素:

str(LogisticData)
'data.frame':   994 obs. of  13 variables:
$ outcome       : Factor w/ 2 levels "No","Yes": 1 1 2 2 1 2 2 1 2 2 ...
$ Gender        : Factor w/ 3 levels "Male","Female",..: 1 2 2 1 2 1 1 1 1 
$ Race          : Factor w/ 3 levels "White","Black",..: 1 1 1 3 1 1 1 1 1 1 
$ Ethnicity     : Factor w/ 2 levels "Hispanic/Latino",..: 2 2 2 2 2 2 2 2 2 
$ Education     : Factor w/ 2 levels "Below Bachelors",..: 1 1 1 2 1 1 1 2 1 
$ Comorbidities : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 2 2 1 1 ...
$ WLProgram     : Factor w/ 2 levels "No","Yes": NA 1 2 2 1 1 1 NA 1 1 ...
$ LoseWeight    : Factor w/ 2 levels "Yes","No": 2 1 1 1 1 1 1 2 1 1 ...
$ PastYearLW    : Factor w/ 2 levels "Yes","No": NA 2 1 1 1 2 1 NA 1 1 ...
$ EverLoseWeight: Factor w/ 2 levels "Yes","No": 2 1 1 1 1 1 1 2 1 1 ...
$ Age           : int  29 35 69 32 21 45 40 62 59 58 ...
$ Participation : Factor w/ 2 levels "Yes","No": 2 2 1 1 1 1 1 2 1 2 ...
$ BMI           : num  25.7 33.8 26.4 32.3 27.5 ...

所有因素似乎都有2个或多个级别。

我还试图省略NA,这仍然给了我这个错误。

我想要回归中的所有变量,但不知道为什么它不会运行。

执行时:

newdata <- droplevels(na.omit(LogisticData))
> str(newdata)
'data.frame':   840 obs. of  13 variables:
$ outcome       : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 2 2 2 ...
$ Gender        : Factor w/ 3 levels "Male","Female",..: 2 2 1 2 1 1 1 2 1 
$ Race          : Factor w/ 3 levels "White","Black",..: 1 1 3 1 1 1 1 1 3 
$ Ethnicity     : Factor w/ 2 levels "Hispanic/Latino",..: 2 2 2 2 2 2 2 2 
$ Education     : Factor w/ 2 levels "Below Bachelors",..: 1 1 2 1 1 1 1 1 
$ Comorbidities : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 1 1 2 ...
$ WLProgram     : Factor w/ 2 levels "No","Yes": 1 2 2 1 1 1 1 1 1 1 ...
$ LoseWeight    : Factor w/ 1 level "Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ PastYearLW    : Factor w/ 2 levels "Yes","No": 2 1 1 1 2 1 1 1 1 2 ...
$ EverLoseWeight: Factor w/ 1 level "Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Age           : int  35 69 32 21 45 40 59 58 23 32 ...
$ Participation : Factor w/ 2 levels "Yes","No": 2 1 1 1 1 1 1 2 2 1 ...
$ BMI           : num  33.8 26.4 32.3 27.5 45.4 ...
- attr(*, "na.action")=Class 'omit'  Named int [1:154] 1 8 13 14 21 24 25 
46 55 58 ...
.. ..- attr(*, "names")= chr [1:154] "1" "8" "13" "14" ...

这对我来说没有意义,因为你可以在第一个str(Logisitic Data(中看到,EverLoseWeight中显然有两个级别,你可以看到Yes和No以及1和2?如何修复此异常?

考虑到您的更新,看起来您至少有两种可能性。

1:去除NA后只剩下一个级别的因素(即LoseWeightEverLoseWeight(。

2:将NA视为一个额外的级别。类似的东西

a = as.factor(c(1,1,NA,2))
b = as.factor(c(1,1,2,1))
# 0 is an unused factor level for a
x = data.frame(a, b)
levels(x$a) = c(levels(x$a), 0)
x$a[is.na(x$a)] = 0

但这可能无法处理任何奇异性问题,这些问题也导致了单级因素的存在。

尝试对原始数据执行summary,并确保所有级别都有值。我会把这句话写在评论中,但我没有信誉点:(

最新更新