类的稀疏矩阵"dgCMatrix"转换为数据帧,它用于R中的同时回归



我有20个变量,其中我在R中执行了多次LASSO回归。我取了一个预测器,并用下面的代码将其与模型中的其他预测器进行回归

library(readxl)
data <-read_excel("data.xlsx")
library(glmnet)
library(coefplot)
A <- as.matrix(data)
results <- lapply(seq_len(ncol(A)), function(i) {
list(
fit_lasso = glmnet(A[, -i], A[, i], standardize = T, alpha = 0.9), 
cvfit = cv.glmnet(A[, -i] , A[, i] , standardize = TRUE , type.measure = "mse" , nfolds = 5 , alpha = 0.9)
)
})
#display only the non zero coefficients
coefficients <- lapply(results, function(x, fun) fun(coef(x$cvfit, s = "lambda.min")), function(x) x[x[, 1L] != 0L, 1L, drop = FALSE])

导致ncol(data)sparse Matrix of class "dgCMatrix"在一个列表中不同。对于所有ncol(data)变量,显示它们如下所示:

> coefficients 
[[1]]
10 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -2.214861e+03
X3           2.812453e-05
X5           5.841003e-01
X6           5.428515e+00
X7           1.080925e+01
X8           2.454695e+01
X10          3.917866e-01
X12          2.488678e+00
X13          5.441626e+00
X14          2.400565e-01
[[2]]
6 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -7.179757e-01
X3           6.563784e-09
X6           1.867302e-02
X8           1.854556e-01
X10         -2.601140e-03
X13          9.105201e-01

我希望能够在数据框架中提取这些变量,以便以后使用它们进行回归。对于其中一个sparse Matrix of class "dgCMatrix"(让我们使用第一个X1(,我设法使用它来创建数据帧

results[[1L]]$cvfit$lambda.min
coeffs<-coef(results[[2L]]$cvfit, s = "lambda.min")
summs <- summary(coeffs)
ssVarX1 <- data.frame(variables      = rownames(coeffs)[summs$i],
coefficient      = summs$x)

结果是:

variables   coefficient
1  (Intercept) -2.214861e+03
2           X3  2.812453e-05
3           X5  5.841003e-01
4           X6  5.428515e+00
5           X7  1.080925e+01
6           X8  2.454695e+01
7          X10  3.917866e-01
8          X12  2.488678e+00
9          X13  5.441626e+00
10         X14  2.400565e-01

尽管,在某些情况下,ssVarX可以没有变量,然后结果具有以下形式的

variable coefficient
1 (Intercept)    106.0629

如何同时为所有现有的sparse Matrix of class "dgCMatrix"创建数据帧,每个数据帧的名称都为ssVarX[i], i=1,...,ncol(data)

根据评论,大部分已经用以下代码完成

library(readxl)
data <-read_excel("data.xlsx")
library(glmnet)
library(coefplot)
A <- as.matrix(data)
results <- lapply(seq_len(ncol(A)), function(i) {
list(
fit_lasso = glmnet(A[, -i], A[, i], standardize = T, alpha = 0.9), 
cvfit = cv.glmnet(A[, -i] , A[, i] , standardize = TRUE , type.measure = "mse" , nfolds = 5 , alpha = 0.9)
)
})
coefficients <- lapply(results, function(x, fun) fun(coef(x$cvfit, s = "lambda.min")), function(x) x[x[, 1L] != 0L, 1L, drop = FALSE])
list2env(`names<-`(
lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L]))), 
paste0("ssVarX", seq_along(coefficients))
), envir = .GlobalEnv)
`names<-`(lapply(ls.str(pattern = "ssVarX"), function(x) {
is <- as.integer(sub("(ssVar)?X", "", c(x, get(x, envir = .GlobalEnv)$variable[-1])))
if (length(is) == 1) is <- c(is, seq_along(data)[-is])
as.matrix(coef(lm(data = data[, is])))
}), ls.str(pattern = "ssVarX"))

但是,即使所选择的解释变量的数量在所有情况下都是正确的,相应的模型也没有使用ssVarX数据帧中存在的正确变量。我希望它回归每个Xi,作为ssVarX指示变量的预测器,并从data中提取。为什么会发生这种情况?如何显示每次回归的汇总结果?

这就是您想要的吗?

lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L])))

更新

list2env(`names<-`(
lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L]))), 
paste0("ssVarX", seq_along(coefficients))
), envir = .GlobalEnv)

更新2

`names<-`(lapply(ls.str(pattern = "ssVarX"), function(x) {
is <- as.integer(sub("(ssVar)?X", "", c(x, get(x, envir = .GlobalEnv)$variable[-1])))
if (length(is) == 1) is <- c(is, seq_along(data)[-is])
as.matrix(coef(lm(data = data[, is])))
}), ls.str(pattern = "ssVarX"))

更新3

这样做怎么样?

ssVarX <- lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L])))
lm_results <- lapply(seq_along(ssVarX), function(i, df) {
x_vars <- df[[i]]$variable[-1L]
if (length(x_vars) == 0) x_vars <- "."
fml <- as.formula(paste0("X", i, " ~ ", paste(x_vars, collapse = " + ")))
lm(fml, data = data)
}, ssVarX)

lm结果中检索coefficients

lapply(lm_results, function(x) as.matrix(coef(x)))

lm结果中检索formula

lapply(lm_results, function(x) formula(x))

只需将所有内容打印到控制台:

lm_results

如果这次仍然有错误,请给我看错误的公式。

最新更新