所以我有两个变量,分别是Verbal(SATV)
和Quantitative(SATQ)
的SAT分数。有 500 行。SATQ
有7 NA's
。我的目标是运行lm()
,并gvlma()
SATV
和SATQ
作为IVs
。
但是我收到一个错误,说 R 不会运行我的代码,因为我从SATQ
中省略了NAs
,现在我的变量长度不同。 如何重新编码NA's
以使我的变量保持相同的长度。
忽略非正常数据和违反假设的行为。(我也不知道我在做什么,所以如果你可以提供建议,假装你正在和一个对R零理解的人交谈)
> summary(SATQ)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
200.0 525.0 610.0 604.5 700.0 800.0 7
> SATQ2<-na.omit(SATQ)
> summary(SATQ2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
200.0 525.0 610.0 604.5 700.0 800.0
> summary(SATV)
Min. 1st Qu. Median Mean 3rd Qu. Max.
200.0 537.5 600.0 604.4 690.0 800.0
> summary(ms)
Min. 1st Qu. Median Mean 3rd Qu. Max.
765.6 1844.0 2133.0 2093.0 2395.0 2877.0
> #ms= monthly salary
> m1 = lm(ms~SATV+SATQ2)
Error in model.frame.default(formula = ms ~ SATV + SATQ2, drop.unused.levels = TRUE) :
variable lengths differ (found for 'SATQ2')
> m1 = lm(ms~SATV+SATQ2)
Error in model.frame.default(formula = ms ~ SATV + SATQ2, drop.unused.levels = TRUE) :
variable lengths differ (found for 'SATQ2')
> summary(m1)
Call:
lm(formula = ms ~ SATV + SATQ2)
Residuals:
Min 1Q Median 3Q Max
-1551.58 -12.48 45.32 99.77 168.46
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.5656 55.1658 1.044 0.297
SATV 1.4313 0.1030 13.890 <2e-16 ***
SATQ2 1.9350 0.1025 18.871 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 206.3 on 490 degrees of freedom
(7 observations deleted due to missingness)
Multiple R-squared: 0.7419, Adjusted R-squared: 0.7409
F-statistic: 704.3 on 2 and 490 DF, p-value: < 2.2e-16
> gvlma(m1)
Call:
lm(formula = ms ~ SATV + SATQ2)
Coefficients:
(Intercept) SATV SATQ2
57.566 1.431 1.935
ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance = 0.05
Call:
gvlma(x = m1)
Value p-value Decision
Global Stat 7.904e+03 0.00e+00 Assumptions NOT satisfied!
Skewness 1.261e+03 0.00e+00 Assumptions NOT satisfied!
Kurtosis 6.593e+03 0.00e+00 Assumptions NOT satisfied!
Link Function 2.317e-02 8.79e-01 Assumptions acceptable.
Heteroscedasticity 5.036e+01 1.28e-12 Assumptions NOT satisfied!
最简单的选择可能是执行以下操作:
dta = data.frame(SATV=SATV, SATQ=SATQ, ms = ms)
lm(ms ~ SATV + SATQ, data = na.omit(dta))
这将逐行删除 NA。