当数据太小时，为什么lmerTest给出不同的p值?

我是统计学和这个包的新手。我期望如果我的数据乘以或除以相同的数字，例如，所有*10或所有*100,p值应该是相同的。

但由于我的数据太小(~10^-9)，p值在开始时几乎是1。但是当我将数据(测试中的"x")相乘时，p值会减小，直到数据变为~10^-5，然后p值不会改变。

test= lmer(x ~ a + b +c + (1|rep), data=data)
Estimate   Std. Error     df    t value   Pr(>|t|)
a2  -5.783e-09  1.232e-09  8.879e-05  -4.693    0.999            (raw data)
a2  -5.783e-08  1.232e-08  6.177e-03  -4.693    0.971            (raw data*10)
a2  -5.783e-07  1.232e-07  3.473e-01  -4.693    0.397            (raw data*100)
a2  -5.783e-06  1.232e-06  7.851e+00  -4.693  0.00164 **         (raw data*1000)
a2  -5.783e-05  1.232e-05  9.596e+01  -4.693 8.95e-06 ***        (raw data*10000)
a2  -0.0005783  0.0001232 95.9638425  -4.693 8.95e-06 ***        (raw data*100000)

我不明白为什么这些p值会变成常数。有人能给我解释一下吗?

好吧，经过一番挖掘，我想我已经找到了解决方案和解释。正如你在例子中看到的，t值是不变的。p值的变化是由于估计自由度的变化。默认方法是Satterthwaite方法，根据包的一个作者的这篇文章，它取决于因变量(参见这里的文章:https://stats.stackexchange.com/questions/342848/satterthwaite-degrees-of-freedom-in-a-mixed-model-change-drastically-depending-o)

现在，在正常数量级范围内，自由度不改变，p值保持不变。在你的例子中，你从另一个方向来解决这个问题，注意到数字在某一点后停止变化(当DV中的数字足够大时)。在这里，我使用R中包含的iris包中的一个示例来展示它们是稳定的:

# Preparing data
d <- iris
d$width <- d$Sepal.Width
d$Species <- as.factor(d$Species)
# Creating slightly smaller versions of the DV
d$length <- d$Sepal.Length
d$length_10 <- d$Sepal.Length/10
d$length_1e2 <- d$Sepal.Length/1e2
d$length_1e3 <- d$Sepal.Length/1e3
# fitting the models
m1 <- lmer(length ~ width + (1|Species),data = d)
m2 <- lmer(length_10 ~ width + (1|Species),data = d)
m3 <- lmer(length_1e2 ~ width + (1|Species),data = d)
m4 <- lmer(length_1e3 ~ width + (1|Species),data = d)
# The coefficients are all the same
> summary(m1)$coefficients
Estimate Std. Error         df  t value     Pr(>|t|)
(Intercept) 3.4061671  0.6683080   3.405002 5.096703 1.065543e-02
width       0.7971543  0.1062064 146.664820 7.505711 5.453404e-12
> summary(m2)$coefficients
Estimate Std. Error         df  t value     Pr(>|t|)
(Intercept) 0.34061671 0.06683080   3.405002 5.096703 1.065543e-02
width       0.07971543 0.01062064 146.664820 7.505711 5.453404e-12
> summary(m3)$coefficients
Estimate  Std. Error         df  t value     Pr(>|t|)
(Intercept) 0.034061671 0.006683079   3.405003 5.096703 1.065542e-02
width       0.007971543 0.001062064 146.664820 7.505711 5.453405e-12
> summary(m4)$coefficients
Estimate   Std. Error         df  t value     Pr(>|t|)
(Intercept) 0.0034061671 0.0006683079   3.405003 5.096703 1.065542e-02
width       0.0007971543 0.0001062064 146.664820 7.505711 5.453405e-12

然而，你的数字比这个小得多，所以我做了一个小得多的DV版本来尝试重新创建你的例子。正如你所看到的，自由度开始接近于零，这导致p值向1移动。

# Much smaller numbers
d$length_1e6 <- d$Sepal.Length/1e6
d$length_1e7 <- d$Sepal.Length/1e7
d$length_1e8 <- d$Sepal.Length/1e8
# fitting the models
m5 <- lmer(length_1e6 ~ width + (1|Species),data = d)
m6 <- lmer(length_1e7 ~ width + (1|Species),data = d)
m7 <- lmer(length_1e8 ~ width + (1|Species),data = d)
# Here we recreate the problem
> summary(m5)$coefficients
Estimate   Std. Error        df  t value  Pr(>|t|)
(Intercept) 3.406167e-06 6.683079e-07 0.5618686 5.096703 0.2522273
width       7.971543e-07 1.062064e-07 0.6730683 7.505711 0.1599534
> summary(m6)$coefficients
Estimate   Std. Error         df  t value  Pr(>|t|)
(Intercept) 3.406167e-07 6.683080e-08 0.01224581 5.096703 0.9461743
width       7.971543e-08 1.062064e-08 0.01229056 7.505711 0.9415154
> summary(m7)$coefficients
Estimate   Std. Error           df  t value  Pr(>|t|)
(Intercept) 3.406167e-08 6.683080e-09 0.0001784636 5.096703 0.9988162
width       7.971543e-09 1.062064e-09 0.0001784738 7.505711 0.9987471

一个可能的解决方案是使用另一种近似方法，Kenward-Roger。我们用DV最小变换的模型。我们可以用下面的代码来完成:

summary(m7, ddf="Kenward-Roger")$coefficients
Estimate   Std. Error         df  t value     Pr(>|t|)
(Intercept) 3.406167e-08 6.687077e-09   3.408815 5.093656 1.064475e-02
width       7.971543e-09 1.064752e-09 146.666335 7.486759 6.053660e-12

正如你所看到的，通过这种方法，我们的最小变换版本的数字现在与大变换版本的稳定数字相匹配。准确理解为什么小数字对Satterthwaite方法来说是一个问题，超出了我对lmerTest方法所使用的方法的理解，但我知道这里至少有一个方法，并且可能能够提供额外的见解。我怀疑这可能与底流有关，因为你们的人数很少，但我不能确定。

我希望这对你有帮助!

相关内容

最新更新

热门标签：