我有20个变量,其中我在R中执行了多次LASSO回归。我取了一个预测器,并用下面的代码将其与模型中的其他预测器进行回归
library(readxl)
data <-read_excel("data.xlsx")
library(glmnet)
library(coefplot)
A <- as.matrix(data)
results <- lapply(seq_len(ncol(A)), function(i) {
list(
fit_lasso = glmnet(A[, -i], A[, i], standardize = T, alpha = 0.9),
cvfit = cv.glmnet(A[, -i] , A[, i] , standardize = TRUE , type.measure = "mse" , nfolds = 5 , alpha = 0.9)
)
})
#display only the non zero coefficients
coefficients <- lapply(results, function(x, fun) fun(coef(x$cvfit, s = "lambda.min")), function(x) x[x[, 1L] != 0L, 1L, drop = FALSE])
导致ncol(data)
与sparse Matrix of class "dgCMatrix"
在一个列表中不同。对于所有ncol(data)
变量,显示它们如下所示:
> coefficients
[[1]]
10 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -2.214861e+03
X3 2.812453e-05
X5 5.841003e-01
X6 5.428515e+00
X7 1.080925e+01
X8 2.454695e+01
X10 3.917866e-01
X12 2.488678e+00
X13 5.441626e+00
X14 2.400565e-01
[[2]]
6 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -7.179757e-01
X3 6.563784e-09
X6 1.867302e-02
X8 1.854556e-01
X10 -2.601140e-03
X13 9.105201e-01
我希望能够在数据框架中提取这些变量,以便以后使用它们进行回归。对于其中一个sparse Matrix of class "dgCMatrix"
(让我们使用第一个X1(,我设法使用它来创建数据帧
results[[1L]]$cvfit$lambda.min
coeffs<-coef(results[[2L]]$cvfit, s = "lambda.min")
summs <- summary(coeffs)
ssVarX1 <- data.frame(variables = rownames(coeffs)[summs$i],
coefficient = summs$x)
结果是:
variables coefficient
1 (Intercept) -2.214861e+03
2 X3 2.812453e-05
3 X5 5.841003e-01
4 X6 5.428515e+00
5 X7 1.080925e+01
6 X8 2.454695e+01
7 X10 3.917866e-01
8 X12 2.488678e+00
9 X13 5.441626e+00
10 X14 2.400565e-01
尽管,在某些情况下,ssVarX可以没有变量,然后结果具有以下形式的
variable coefficient
1 (Intercept) 106.0629
如何同时为所有现有的sparse Matrix of class "dgCMatrix"
创建数据帧,每个数据帧的名称都为ssVarX[i], i=1,...,ncol(data)
?
根据评论,大部分已经用以下代码完成
library(readxl)
data <-read_excel("data.xlsx")
library(glmnet)
library(coefplot)
A <- as.matrix(data)
results <- lapply(seq_len(ncol(A)), function(i) {
list(
fit_lasso = glmnet(A[, -i], A[, i], standardize = T, alpha = 0.9),
cvfit = cv.glmnet(A[, -i] , A[, i] , standardize = TRUE , type.measure = "mse" , nfolds = 5 , alpha = 0.9)
)
})
coefficients <- lapply(results, function(x, fun) fun(coef(x$cvfit, s = "lambda.min")), function(x) x[x[, 1L] != 0L, 1L, drop = FALSE])
list2env(`names<-`(
lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L]))),
paste0("ssVarX", seq_along(coefficients))
), envir = .GlobalEnv)
`names<-`(lapply(ls.str(pattern = "ssVarX"), function(x) {
is <- as.integer(sub("(ssVar)?X", "", c(x, get(x, envir = .GlobalEnv)$variable[-1])))
if (length(is) == 1) is <- c(is, seq_along(data)[-is])
as.matrix(coef(lm(data = data[, is])))
}), ls.str(pattern = "ssVarX"))
但是,即使所选择的解释变量的数量在所有情况下都是正确的,相应的模型也没有使用ssVarX
数据帧中存在的正确变量。我希望它回归每个Xi,作为ssVarX指示变量的预测器,并从data
中提取。为什么会发生这种情况?如何显示每次回归的汇总结果?
这就是您想要的吗?
lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L])))
更新
list2env(`names<-`(
lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L]))),
paste0("ssVarX", seq_along(coefficients))
), envir = .GlobalEnv)
更新2
`names<-`(lapply(ls.str(pattern = "ssVarX"), function(x) {
is <- as.integer(sub("(ssVar)?X", "", c(x, get(x, envir = .GlobalEnv)$variable[-1])))
if (length(is) == 1) is <- c(is, seq_along(data)[-is])
as.matrix(coef(lm(data = data[, is])))
}), ls.str(pattern = "ssVarX"))
更新3
这样做怎么样?
ssVarX <- lapply(coefficients, function(x) data.frame(variable = row.names(x), coefficient = unname(x[, 1L])))
lm_results <- lapply(seq_along(ssVarX), function(i, df) {
x_vars <- df[[i]]$variable[-1L]
if (length(x_vars) == 0) x_vars <- "."
fml <- as.formula(paste0("X", i, " ~ ", paste(x_vars, collapse = " + ")))
lm(fml, data = data)
}, ssVarX)
从lm
结果中检索coefficients
:
lapply(lm_results, function(x) as.matrix(coef(x)))
从lm
结果中检索formula
:
lapply(lm_results, function(x) formula(x))
只需将所有内容打印到控制台:
lm_results
如果这次仍然有错误,请给我看错误的公式。