R model.frame.default中的错误-可变长度不同



这是我第一次在这里发帖,这也是我第二次使用R,所以请温柔一点。

我正在做一个项目,我在R中试图通过lmer函数使用lme4和lmeTest包进行多次分析。为了做到这一点,我列出了我想分析的变量的名称,我使用for循环对其进行了分析。因此,它看起来像这样:

list <- MyList of IDs
raw <- My Data File
for (i in list) {
model <- lmer (`i` ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), data = raw)
.
.
.
Do something..
.
.
}

然而,这会产生以下错误:

Error in model.frame.default(data = raw, drop.unused.levels = TRUE, formula = paste(i) ~  : 
variable lengths differ (found for 'Time')

我很确定这个问题与lmer语句有关,可能与i有关,而且每当我手动将一个值复制到i的位置时,一切都会完美工作。然而,由于我在";列表";我需要某种循环。

我在谷歌上搜索了很多,在这里找到了几个答案,试图解决相同/相似的问题,但对我来说,它们不起作用。下面是一些更好的解决方案的链接列表,但它们并没有解决我的问题。

  • model.frame.default错误:可变长度不同
  • 具有分组数据R的ARIMAX
  • https://rstudio-pubs-static.s3.amazonaws.com/63556_e35cc7e2dfb54a5bb551f3fa4b3ec4ae.html

有人能对这一现象提供一些见解吗?

Edit 1-@r2evans要求提供一个可复制的示例。以下提供了这一点。

#Packages used
library(Matrix) 
library(lme4)
library(lmerTest)
library(stringr)
library(readr)
#Starting to read data
rootDir <- getwd()
raw <- read.csv(str_c(rootDir, "/P035aForR.csv"), na = c("", "NA", "0"))
list <- read.csv(str_c(rootDir, "/P035aHeaddersForR2.csv"), na = c("", "NA", "0"), header = FALSE)
#Generate dir for output and the dataframe to store the main results
dir.create("Results", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Data", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/QQPlot", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/RawResiduals", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/PersonResiduals", showWarnings = TRUE, recursive = FALSE, mode = "0777")
Result <- data.frame("","","","","","")
names(Result)<-c("ID","SecretorStatus","Time","BioRep","TechRep","Shapiro") ### <-- Update the headders as needed
Result <- Result[-c(1),]
#Load variables from raw as factors for the analysis
raw$SecretorStatus <- as.factor(raw$SecretorStatus)
raw$Time <- as.factor(raw$Time)
raw$TechRep <- as.factor(raw$TechRep)
raw$BioRep <- as.factor(raw$BioRep)
raw$Random <- as.factor(raw$Random)
for (i in list) {
model <- lmer (`i` ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), data = raw) 
anova <- as.data.frame(anova(model, type = 2),) 
residuals <- residuals(model, "response")
pvalue <- round(shapiro.test(residuals)$p.value, digits=4 ) 
out <- as.data.frame(anova(model, type = 2))
write.table(out, "Results/Data/i.txt", col.names=T, row.names=T, quote=F, sep=",")
png(Results/Pictures/QQPlot/i.png)
qqnorm(residuals, sub=paste("p-value of a Shapiro-Wilks test =", pvalue )); qqline(residuals) 
dev.off()
Pearson.Residuals <- residuals(model, "pearson")
Raw.Residuals <- residuals(model, "response")
Fitted <- fitted(model)
png(Results/Pictures/RAWResiduals/i.png)
plot(Fitted, Raw.Residuals, main=i)
dev.off()
png(Results/Pictures/PersonResiduals/i.png)
plot(Fitted,Pearson.Residuals, main=i)
dev.off()
tmp <- data.frame(i, pvalue, t(anova[,"Pr(>F)"]))
names(tmp) <- c("ID", "Shapiro", row.names(anova))
Result <- rbind(Result, tmp)
}

在这一点上,错误产生的地方,它说:

Error in model.frame.default(data = raw, drop.unused.levels = TRUE, formula = i ~  : 
variable lengths differ (found for 'Time')

关于数据的设置,我无法提供完整的数据集(显然(,但下面提供了一个演示结构的示例。这是原始变量的数据。

SecretorStatus  Time    TechRep BioRep  Random  ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10    ID11    ID12    ID13    ID14    ID15    ID16
1   1   1   1   1   23342.99    23342.99    0   0   0   0   0   0   0   0   0   0   102829.8    492252.5    0   924436.3
2   1   2   5   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   529782
2   1   1   6   3   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   506987.7
2   1   2   6   4   0   0   0   0   0   0   0   0   0   0   0   0   0   48786.41    0   618768.5
1   1   2   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   414852.1    354153.5    850788.9
1   1   1   2   2   0   0   0   0   0   0   0   0   0   99551.51    0   0   322185.6    0   361100.2    819073.6
1   1   2   2   3   0   0   0   0   0   90194.2 0   0   0   73646.15    0   0   0   398369.2    277569.9    613257.3
1   1   1   3   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
1   1   2   3   1   0   0   0   0   0   0   0   0   0   0   0   0   0   265760.8    0   0
2   1   1   4   2   0   0   0   0   0   0   0   0   0   0   0   0   61351.9 554385.9    0   656984.3
2   1   2   4   3   0   0   0   0   0   0   0   0   0   0   0   0   0   622428.4    0   769227.8
2   1   1   5   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   388584.9
1   2   1   1   1   31454.26    31454.26    0   0   0   0   0   0   0   0   0   0   0   0   0   729234.2
1   2   2   1   2   0   0   0   0   0   0   0   0   0   0   0   0   0   333620.4    0   933046.3
1   2   1   2   3   0   0   0   0   0   0   0   0   0   0   0   0   0   834145.3    0   0
1   2   2   2   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   157152.7
1   2   1   3   1   0   0   0   0   0   0   0   0   0   0   0   0   0   178179.3    0   812282.9
1   2   2   3   2   0   0   0   0   0   86782.91    0   0   0   0   0   0   0   191167  0   663968.9
2   2   1   4   3   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   610315.3
2   2   2   4   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   339407.1
2   2   1   5   1   0   0   0   0   0   0   0   0   0   213881.1    0   0   0   0   0   298894.5
2   2   2   5   2   0   0   0   0   0   0   0   0   0   81122.63    0   0   0   0   0   170576.6
2   2   1   6   3   0   0   0   0   0   0   0   0   0   53790.86    0   0   0   0   0   205826
2   2   2   6   4   0   0   0   0   0   37900.34    0   0   0   0   0   0   0   0   315754  232529.7

对于列表中的数据,上面的示例数据的数据结构将是

ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10    ID11    ID12    ID13    ID14    ID15    ID16

包含原始和列表数据的两个文件都是csv文件。在整个过程中,只产生了一个错误(至少在我看来是这样(。但是,如果您执行两次代码,它会抱怨文件夹已经存在。然而,除此之外,它只生成一个错误。我注意到的另一个细节是,当这个错误发生时,变量i被卡在ID1处,因此它不能浏览整个列表,它在第一个对象处失败。我希望这有助于澄清。请让我知道,如果有更多的细节,你想/需要复制错误。

很难从你给我们的东西中分辨出来,但我认为这应该做到:

pred_vars <- c("Time", "SecretorStatus", "BioRep", "TechRep", "(1|Random)")
list_of_IDs <- names(raw)[startsWith(names(raw), "ID")]
for (i in list_of_IDs) {
f <- reformulate(pred_vars, response = i)
model <- lmer (f, data = raw)
## ...
}

你并不真的需要reformulate,你也可以使用pastesprintf或任何其他字符串操作机制将你的公式组合成一个字符串,然后应用as.formula(),但reformulate要好一点。

稍微更有效的解决方案将使用refit():

for (i in list_of_IDs) {
if (i == list_of_IDs[1]) {
model <- lmer (ID1 ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), 
data = raw)
} else {
model <- refit(model, newresp = raw[[i]])
}
## ...
}

?refit中所述;"refit(("方法应该更快,因为它绕过了模型表示的创建,直接进入优化步骤">

相关内容

  • 没有找到相关文章

最新更新