如何重新编码缺失的数据,以使我的变量长度在 R 中相同



所以我有两个变量,分别是Verbal(SATV)Quantitative(SATQ)的SAT分数。有 500 行。SATQ有7 NA's。我的目标是运行lm(),并gvlma() SATVSATQ作为IVs
但是我收到一个错误,说 R 不会运行我的代码,因为我从SATQ中省略了NAs,现在我的变量长度不同。 如何重新编码NA's以使我的变量保持相同的长度。
忽略非正常数据和违反假设的行为。(我也不知道我在做什么,所以如果你可以提供建议,假装你正在和一个对R零理解的人交谈)

> summary(SATQ)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  200.0   525.0   610.0   604.5   700.0   800.0       7 
> SATQ2<-na.omit(SATQ)
> summary(SATQ2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  200.0   525.0   610.0   604.5   700.0   800.0 
> summary(SATV)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  200.0   537.5   600.0   604.4   690.0   800.0
> summary(ms)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  765.6  1844.0  2133.0  2093.0  2395.0  2877.0 
> #ms= monthly salary
> m1 = lm(ms~SATV+SATQ2)
Error in model.frame.default(formula = ms ~ SATV + SATQ2, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'SATQ2')
> m1 = lm(ms~SATV+SATQ2)
Error in model.frame.default(formula = ms ~ SATV + SATQ2, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'SATQ2')
> summary(m1)
Call:
lm(formula = ms ~ SATV + SATQ2)
Residuals:
     Min       1Q   Median       3Q      Max 
-1551.58   -12.48    45.32    99.77   168.46 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  57.5656    55.1658   1.044    0.297    
SATV          1.4313     0.1030  13.890   <2e-16 ***
SATQ2         1.9350     0.1025  18.871   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 206.3 on 490 degrees of freedom
  (7 observations deleted due to missingness)
Multiple R-squared:  0.7419,    Adjusted R-squared:  0.7409 
F-statistic: 704.3 on 2 and 490 DF,  p-value: < 2.2e-16
> gvlma(m1)
Call:
lm(formula = ms ~ SATV + SATQ2)
Coefficients:
(Intercept)         SATV        SATQ2  
     57.566        1.431        1.935  

ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 
Call:
 gvlma(x = m1) 
                       Value  p-value                   Decision
Global Stat        7.904e+03 0.00e+00 Assumptions NOT satisfied!
Skewness           1.261e+03 0.00e+00 Assumptions NOT satisfied!
Kurtosis           6.593e+03 0.00e+00 Assumptions NOT satisfied!
Link Function      2.317e-02 8.79e-01    Assumptions acceptable.
Heteroscedasticity 5.036e+01 1.28e-12 Assumptions NOT satisfied!

最简单的选择可能是执行以下操作:

dta = data.frame(SATV=SATV, SATQ=SATQ, ms = ms)
lm(ms ~ SATV + SATQ, data = na.omit(dta))

这将逐行删除 NA。

相关内容

  • 没有找到相关文章

最新更新