这是我第一次在这里发帖,这也是我第二次使用R,所以请温柔一点。
我正在做一个项目,我在R中试图通过lmer函数使用lme4和lmeTest包进行多次分析。为了做到这一点,我列出了我想分析的变量的名称,我使用for循环对其进行了分析。因此,它看起来像这样:
list <- MyList of IDs
raw <- My Data File
for (i in list) {
model <- lmer (`i` ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), data = raw)
.
.
.
Do something..
.
.
}
然而,这会产生以下错误:
Error in model.frame.default(data = raw, drop.unused.levels = TRUE, formula = paste(i) ~ :
variable lengths differ (found for 'Time')
我很确定这个问题与lmer语句有关,可能与i
有关,而且每当我手动将一个值复制到i
的位置时,一切都会完美工作。然而,由于我在";列表";我需要某种循环。
我在谷歌上搜索了很多,在这里找到了几个答案,试图解决相同/相似的问题,但对我来说,它们不起作用。下面是一些更好的解决方案的链接列表,但它们并没有解决我的问题。
- model.frame.default错误:可变长度不同
- 具有分组数据R的ARIMAX
- https://rstudio-pubs-static.s3.amazonaws.com/63556_e35cc7e2dfb54a5bb551f3fa4b3ec4ae.html
有人能对这一现象提供一些见解吗?
Edit 1-@r2evans要求提供一个可复制的示例。以下提供了这一点。
#Packages used
library(Matrix)
library(lme4)
library(lmerTest)
library(stringr)
library(readr)
#Starting to read data
rootDir <- getwd()
raw <- read.csv(str_c(rootDir, "/P035aForR.csv"), na = c("", "NA", "0"))
list <- read.csv(str_c(rootDir, "/P035aHeaddersForR2.csv"), na = c("", "NA", "0"), header = FALSE)
#Generate dir for output and the dataframe to store the main results
dir.create("Results", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Data", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/QQPlot", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/RawResiduals", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/PersonResiduals", showWarnings = TRUE, recursive = FALSE, mode = "0777")
Result <- data.frame("","","","","","")
names(Result)<-c("ID","SecretorStatus","Time","BioRep","TechRep","Shapiro") ### <-- Update the headders as needed
Result <- Result[-c(1),]
#Load variables from raw as factors for the analysis
raw$SecretorStatus <- as.factor(raw$SecretorStatus)
raw$Time <- as.factor(raw$Time)
raw$TechRep <- as.factor(raw$TechRep)
raw$BioRep <- as.factor(raw$BioRep)
raw$Random <- as.factor(raw$Random)
for (i in list) {
model <- lmer (`i` ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), data = raw)
anova <- as.data.frame(anova(model, type = 2),)
residuals <- residuals(model, "response")
pvalue <- round(shapiro.test(residuals)$p.value, digits=4 )
out <- as.data.frame(anova(model, type = 2))
write.table(out, "Results/Data/i.txt", col.names=T, row.names=T, quote=F, sep=",")
png(Results/Pictures/QQPlot/i.png)
qqnorm(residuals, sub=paste("p-value of a Shapiro-Wilks test =", pvalue )); qqline(residuals)
dev.off()
Pearson.Residuals <- residuals(model, "pearson")
Raw.Residuals <- residuals(model, "response")
Fitted <- fitted(model)
png(Results/Pictures/RAWResiduals/i.png)
plot(Fitted, Raw.Residuals, main=i)
dev.off()
png(Results/Pictures/PersonResiduals/i.png)
plot(Fitted,Pearson.Residuals, main=i)
dev.off()
tmp <- data.frame(i, pvalue, t(anova[,"Pr(>F)"]))
names(tmp) <- c("ID", "Shapiro", row.names(anova))
Result <- rbind(Result, tmp)
}
在这一点上,错误产生的地方,它说:
Error in model.frame.default(data = raw, drop.unused.levels = TRUE, formula = i ~ :
variable lengths differ (found for 'Time')
关于数据的设置,我无法提供完整的数据集(显然(,但下面提供了一个演示结构的示例。这是原始变量的数据。
SecretorStatus Time TechRep BioRep Random ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11 ID12 ID13 ID14 ID15 ID16
1 1 1 1 1 23342.99 23342.99 0 0 0 0 0 0 0 0 0 0 102829.8 492252.5 0 924436.3
2 1 2 5 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 529782
2 1 1 6 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 506987.7
2 1 2 6 4 0 0 0 0 0 0 0 0 0 0 0 0 0 48786.41 0 618768.5
1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 414852.1 354153.5 850788.9
1 1 1 2 2 0 0 0 0 0 0 0 0 0 99551.51 0 0 322185.6 0 361100.2 819073.6
1 1 2 2 3 0 0 0 0 0 90194.2 0 0 0 73646.15 0 0 0 398369.2 277569.9 613257.3
1 1 1 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 2 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 265760.8 0 0
2 1 1 4 2 0 0 0 0 0 0 0 0 0 0 0 0 61351.9 554385.9 0 656984.3
2 1 2 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 622428.4 0 769227.8
2 1 1 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 388584.9
1 2 1 1 1 31454.26 31454.26 0 0 0 0 0 0 0 0 0 0 0 0 0 729234.2
1 2 2 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 333620.4 0 933046.3
1 2 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 834145.3 0 0
1 2 2 2 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 157152.7
1 2 1 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 178179.3 0 812282.9
1 2 2 3 2 0 0 0 0 0 86782.91 0 0 0 0 0 0 0 191167 0 663968.9
2 2 1 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 610315.3
2 2 2 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 339407.1
2 2 1 5 1 0 0 0 0 0 0 0 0 0 213881.1 0 0 0 0 0 298894.5
2 2 2 5 2 0 0 0 0 0 0 0 0 0 81122.63 0 0 0 0 0 170576.6
2 2 1 6 3 0 0 0 0 0 0 0 0 0 53790.86 0 0 0 0 0 205826
2 2 2 6 4 0 0 0 0 0 37900.34 0 0 0 0 0 0 0 0 315754 232529.7
对于列表中的数据,上面的示例数据的数据结构将是
ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10 ID11 ID12 ID13 ID14 ID15 ID16
包含原始和列表数据的两个文件都是csv文件。在整个过程中,只产生了一个错误(至少在我看来是这样(。但是,如果您执行两次代码,它会抱怨文件夹已经存在。然而,除此之外,它只生成一个错误。我注意到的另一个细节是,当这个错误发生时,变量i
被卡在ID1处,因此它不能浏览整个列表,它在第一个对象处失败。我希望这有助于澄清。请让我知道,如果有更多的细节,你想/需要复制错误。
很难从你给我们的东西中分辨出来,但我认为这应该做到:
pred_vars <- c("Time", "SecretorStatus", "BioRep", "TechRep", "(1|Random)")
list_of_IDs <- names(raw)[startsWith(names(raw), "ID")]
for (i in list_of_IDs) {
f <- reformulate(pred_vars, response = i)
model <- lmer (f, data = raw)
## ...
}
你并不真的需要reformulate
,你也可以使用paste
或sprintf
或任何其他字符串操作机制将你的公式组合成一个字符串,然后应用as.formula()
,但reformulate
要好一点。
稍微更有效的解决方案将使用refit()
:
for (i in list_of_IDs) {
if (i == list_of_IDs[1]) {
model <- lmer (ID1 ~ Time + SecretorStatus + BioRep + TechRep + (1|Random),
data = raw)
} else {
model <- refit(model, newresp = raw[[i]])
}
## ...
}
如?refit
中所述;"refit(("方法应该更快,因为它绕过了模型表示的创建,直接进入优化步骤">