所以想象一下这两组女性和男性的年龄:
femalesage<-c(30,52,59,25,26,72,46,32,64,45)
malesage<-c(40,56,31,63,63,78,42,45,67)
我可以很容易地做一个t.test(女性年龄,男性年龄(来达到以下结果:
t.test(femalesage,malesage)
Welch Two Sample t-test
data: femalesage and malesage
t = -1.2013, df = 16.99, p-value = 0.2461
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-24.224797 6.647019
sample estimates:
mean of x mean of y
45.10000 53.88889
现在,假设我以不同的方式组织了相同的数据,因此如下所示:
ages<-c(30,52,59,25,26,72,46,32,64,45,40,56,31,63,63,78,42,45,67)
genders<-c("F","F","F","F","F","F","F","F","F","F","M","M","M","M","M","M","M","M","M","M")
df<-data.frame(ages, genders)
我想使用某种回归检验获得与威尔士双样本 t 检验类似的结果,测试 Beta1=0 与 Beta1 不等于 0 的斜率,其中 B1 是性别系数,响应是年龄。知道我怎么能得到同样的结果吗?
t 检验和线性回归都是一般线性模型的特例。对于单个预测变量,回归系数显著性的检验等效于 t 检验的显著性。
R 的t.test
函数允许以两种不同的方式指定输入数据:要么像您所做的那样作为两个单独的向量,要么像我在这里所做的那样使用公式接口。同样,执行简单线性回归的lm
函数需要公式接口。在这种情况下,这使得两个函数调用相同,我们只需要更改函数的名称。
您的数据:
ages <- c(30,52,59,25,26,72,46,32,64,45,40,56,31,63,63,78,42,45,67)
genders <- c("F","F","F","F","F","F","F","F","F","F","M","M","M","M","M","M","M","M","M","M")
df <- data.frame(ages, genders)
一个 t 检验:
t.test(ages ~ genders, data = df)
Welch Two Sample t-test
data: ages by genders
t = -1.2013, df = 16.99, p-value = 0.2461
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-24.224797 6.647019
sample estimates:
mean in group F mean in group M
45.10000 53.88889
(几乎(相同的回归:
summary(lm(ages ~ genders, data = df))
Call:
lm(formula = ages ~ genders, data = df)
Residuals:
Min 1Q Median 3Q Max
-22.89 -13.49 0.90 11.11 26.90
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.100 5.060 8.914 8.12e-08 ***
gendersM 8.789 7.351 1.196 0.248
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16 on 17 degrees of freedom
Multiple R-squared: 0.07756, Adjusted R-squared: 0.0233
F-statistic: 1.429 on 1 and 17 DF, p-value: 0.2483
请注意,性别的 t 和 beta 几乎相同,p 值也是如此。