由于字符变量,在 R 中运行 T 检验时出现错误消息

我一直在尝试在 R 中运行双侧 t 检验,但一直遇到错误。以下是我的流程、数据集详细信息和来自 R-studio 的脚本。 我使用了从这个网站下载的一个名为LungCapacity的数据集:https://www.statslectures.com/r-scripts-datasets。

#Imported data set into RStudio.
# Ran a summary report to see the data and class.
# Here I could see that the smoke column is a character, so I converted it to a factor
LungCapacityData$Smoke <- factor(LungCapacityData$Smoke)
# On checking the summary. I see its converted to a factor with a yes and no.
# I want to run a t-test between lung capacity and smoking. 
t.test(LungCapData$LungCap, LungCapData$Smoke, alternative = c("two.sided"), mu=0, var.equal = FALSE, conf.level = 0.95, paired = FALSE)


Error in var(y) : Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA

我尝试将烟雾变量从"是"和"否"转换为 1 和 0。数据运行,但不正确。 我做错了什么?


LungCapacityData <- read.table(
header = TRUE)
t.test(LungCap ~ Smoke, data = LungCapacityData,
alternative = c("two.sided"), mu=0, var.equal = FALSE,
conf.level = 0.95, paired = FALSE)
#   Welch Two Sample t-test
#data:  LungCap by Smoke
#t = -3.6498, df = 117.72, p-value = 0.0003927
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -1.3501778 -0.4003548
#sample estimates:
# mean in group no mean in group yes 
#         7.770188          8.645455 


# [1]  6.475 10.125  9.550 11.125  4.800  6.225  4.950  7.325  8.875  6.800


# [1] no  yes no  no  no  no  no  no  no  no 


公式LungCap ~ SmokeLungCap应该取决于Smoke.使用公式时,还需要提供data =


# [1] 1 2 1 1 1 1 1 1 1 1



t.test(LungCapacityData$LungCap[LungCapacityData$Smoke == "yes"],
LungCapacityData$LungCap[LungCapacityData$Smoke == "no"],
alternative = c("two.sided"), mu=0, var.equal = FALSE,
conf.level = 0.95, paired = FALSE)

如 OP 中所述,t.test()尝试比较两个向量的平均值,因此t.test()函数期望它们都是数字。


data <- read.table(file = "./data/LungCapData.txt",header = TRUE)
t.test(LungCap ~ Smoke,data = data)


> t.test(LungCap ~ Smoke,data = data)
Welch Two Sample t-test
data:  LungCap by Smoke
t = -3.6498, df = 117.72, p-value = 0.0003927
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.3501778 -0.4003548
sample estimates:
mean in group no mean in group yes 
7.770188          8.645455 
