由于字符变量,在 R 中运行 T 检验时出现错误消息



我一直在尝试在 R 中运行双侧 t 检验,但一直遇到错误。以下是我的流程、数据集详细信息和来自 R-studio 的脚本。 我使用了从这个网站下载的一个名为LungCapacity的数据集:https://www.statslectures.com/r-scripts-datasets。

#Imported data set into RStudio.
# Ran a summary report to see the data and class.
summary(LungCapData)
# Here I could see that the smoke column is a character, so I converted it to a factor
LungCapacityData$Smoke <- factor(LungCapacityData$Smoke)
# On checking the summary. I see its converted to a factor with a yes and no.
# I want to run a t-test between lung capacity and smoking. 
t.test(LungCapData$LungCap, LungCapData$Smoke, alternative = c("two.sided"), mu=0, var.equal = FALSE, conf.level = 0.95, paired = FALSE)

现在运行它时,我收到以下错误。

Error in var(y) : Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA

我尝试将烟雾变量从"是"和"否"转换为 1 和 0。数据运行,但不正确。 我做错了什么?

你非常接近,你只需要用公式调用t.test

LungCapacityData <- read.table(
"https://docs.google.com/uc?id=0BxQfpNgXuWoITmVwQzJ2VF9qVlU&export=download",
header = TRUE)
t.test(LungCap ~ Smoke, data = LungCapacityData,
alternative = c("two.sided"), mu=0, var.equal = FALSE,
conf.level = 0.95, paired = FALSE)
#   Welch Two Sample t-test
#
#data:  LungCap by Smoke
#t = -3.6498, df = 117.72, p-value = 0.0003927
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -1.3501778 -0.4003548
#sample estimates:
# mean in group no mean in group yes 
#         7.770188          8.645455 

使用当前的方法,您正在尝试比较哪个是数字向量LungCapacityData$LungCap

LungCapacityData$LungCap[1:10]
# [1]  6.475 10.125  9.550 11.125  4.800  6.225  4.950  7.325  8.875  6.800

使用LungCapacityData$Smoke,它是因子的向量:

LungCapacityData$Smoke[1:10]
# [1] no  yes no  no  no  no  no  no  no  no 

相反,您希望指示t.test在按LungCapacityData$Smoke分组时比较LungCapacityData$LungCap。这是通过公式实现的。

公式LungCap ~ SmokeLungCap应该取决于Smoke.使用公式时,还需要提供data =

当你试图将LungCapacityData$Smoke转换为数字时,你会得到错误的结果,因为你只是得到没有生物学意义的因子水平指数。

as.numeric(LungCapacityData$Smoke)[1:10]
# [1] 1 2 1 1 1 1 1 1 1 1

你基本上是在问我们分配的因素水平的平均值是否与肺活量平均值不同。

另一种方法是自己LungCapacityData$LungCap子集,但这需要更多的类型:

t.test(LungCapacityData$LungCap[LungCapacityData$Smoke == "yes"],
LungCapacityData$LungCap[LungCapacityData$Smoke == "no"],
alternative = c("two.sided"), mu=0, var.equal = FALSE,
conf.level = 0.95, paired = FALSE)

如 OP 中所述,t.test()尝试比较两个向量的平均值,因此t.test()函数期望它们都是数字。

请改用t.test()的公式版本。使用此方法时,t.test()使用~右侧的列作为分组变量,将~左侧的列用作数值变量,其平均值将在两个组中比较另一个变量。

data <- read.table(file = "./data/LungCapData.txt",header = TRUE)
t.test(LungCap ~ Smoke,data = data)

。和输出:

> t.test(LungCap ~ Smoke,data = data)
Welch Two Sample t-test
data:  LungCap by Smoke
t = -3.6498, df = 117.72, p-value = 0.0003927
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.3501778 -0.4003548
sample estimates:
mean in group no mean in group yes 
7.770188          8.645455 
> 

最新更新