r语言 - 使用 LMER 和预测时'times'参数无效



我正在处理身体活动数据和后续疼痛数据。我有一个大的数据集,但为了改变这个例子,我用我感兴趣的变量创建了一个小的数据集。

由于我的身体活动数据本质上是成分性的,在使用这些变量作为混合效应模型的预测因子之前,我正在使用成分数据分析。我的目标是使用predict((函数来预测我创建的一些新数据,但我收到了以下内容:

Error in rep(0, nobs) : invalid 'times' argument

我在谷歌上搜索了一下,看到了几年前发布的一篇帖子,但答案对我来说不起作用。

以下是数据集和我的代码:

library("tidyverse")
library("compositions")
library("robCompositions")
library("lme4")
dataset <- structure(list(work = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
department = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
worker = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 
4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
age = c(45, 43, 65, 45, 76, 34, 65, 23, 23, 45, 32, 76), 
sex = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
2L, 2L), .Label = c("1", "2"), class = "factor"), pain = c(4, 
                  5, 3, 2, 0, 7, 8, 10, 1, 4, 5, 4), lpa_w = c(45, 65, 43, 
                                                               76, 98, 65, 34, 56, 2, 3, 12, 34), mvpa_w = c(12, 54, 76, 
                                                                                                             87, 45, 23, 65, 23, 54, 76, 23, 54), lpa_l = c(54, 65, 34, 
                                                                                                                                                            665, 76, 87, 12, 34, 54, 12, 45, 12), mvpa_l = c(12, 43, 
                                                                                                                                                                                                             56, 87, 12, 54, 76, 87, 98, 34, 56, 23)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                           -12L))
#create compositions of physical activity
dataset$comp_w <- acomp(cbind(lpa_w = dataset[,7], 
mvpa_w = dataset[,8]))
dataset$comp_l <- acomp(cbind(lpa_l = dataset[,9], 
mvpa_l = dataset[,10]))
#Make a grid to use for predictions for composition of lpa_w and mvpa_w
mygrid=rbind(
expand.grid(lpa_w = seq(min(2), max(98),5),
mvpa_w = seq(min(12), max(87), 5)))
griddata <- acomp(mygrid)
#run the model
model <- lmer(pain ~ ilr(comp_w) + age + sex + ilr(comp_l) +
(1 | work / department / worker),
data = dataset)
(prediction = predict(model, newdata = list(comp_w = griddata,
age = rep(mean(dataset$age, na.rm=TRUE),nrow(griddata)), 
sex = rep("1", nrow(griddata)),
comp_l = do.call("rbind", replicate(n=nrow(griddata), mean(acomp(dataset[,12])), simplify = FALSE)),
work = rep(dataset$work, nrow(griddata)),
department = rep(dataset$department, nrow(griddata)),
worker = rep(dataset$worker, nrow(griddata)))))

如有任何帮助,我们将不胜感激。

感谢

acomp的结果分配给数据帧的一个元素会产生一个奇怪的数据结构,它会把下游的事情搞砸。

构建此数据集(不扰乱原始dataset(:

dataset_weird <- dataset
dataset_weird$comp_w <- acomp(cbind(lpa_w = dataset[,7], 
mvpa_w = dataset[,8]))
dataset_weird$comp_l <- acomp(cbind(lpa_l = dataset[,9], 
mvpa_l = dataset[,10]))

结果是所以奇怪的是,str(dataset_weird),研究R对象结构的常用方法,在中失败了

$comp_w:unsess(x([i,drop=drop]中出错:(下标(逻辑下标太长

如果我们运行sapply(dataset_weird, class),我们会看到这些元素具有类acomp。(他们似乎也有一个奇怪的print()方法:当我们print(dataset_weird$comp_w)时,结果是字符串的矩阵,但如果我们unclass(dataset_weird$comp_w),我们可以看到底层对象是数字[!](

整个问题有点棘手,因为你要处理的是n列矩阵,这些矩阵被转换为特殊的acomp()对象,然后被转换为(n-1(维矩阵(等角对数比转换的组成数据(,然后这些矩阵的列被用作预测因子。最基本的一点是,如果数据帧中的元素不是简单的一维向量,lme4的机制就会混淆。因此,您必须自己创建数据框架列。

以下是我的想法,缺少一块(如下所述(:

## utility function: *either* uses a matrix argument (`comp_data`)
## *or* extracts relevant columns from a data frame (`data`):
## returns ilr-transformed values as a matrix, with appropriate column names
ilr_dat <- function(data, suffix = NULL, comp_data = NULL) {
if (!is.null(suffix) && is.null(comp_data)) {
comp_data <- as.matrix(data[grep(paste0(suffix,"$"), names(data))])
}
ilrmat <- ilr(acomp(comp_data))
colnames(ilrmat) <- paste0("ilr", suffix, ".", 1:ncol(ilrmat))
return(ilrmat)
}
## augment original data set (without weird compositional elements)
## using data.frame() rather than $<- or rbind() collapses matrix arguments
## to data frame rows in a way that R expects
dataset2 <- data.frame(dataset, ilr_dat(dataset, "_l"))
dataset2 <- data.frame(dataset2, ilr_dat(dataset, "_w"))
mygrid <- rbind(
expand.grid(lpa_w = seq(min(2), max(98),5),
mvpa_w = seq(min(12), max(87), 5)))
## generate ilr data for prediction
griddata <- as.data.frame(ilr_dat(comp_data=mygrid, suffix="_w"))
#run the model: ilr(comp_l) **not** included, see below
model <- lmer(pain ~ ilr_w.1 + age + sex  + ## ilr(comp_l) +
(1 | work / department / worker),
data = dataset2)
## utility function for replication
xfun <- function(s) rep(dataset[[s]], nrow(griddata))
predict(model, newdata = data.frame(griddata,
age = mean(dataset$age, na.rm=TRUE),
sex = "1",
work = xfun("work"),
department = xfun("department"),
worker = xfun("worker")))

这似乎奏效了。

我没有在模型或预测中包括_l成分/irl的原因是我不明白这个声明在做什么:

comp_l = do.call("rbind", replicate(n=nrow(griddata), mean(acomp(dataset[,12])), simplify = FALSE))

最新更新