r-按组运行Lasso回归并合并结果



我想将glmnet的套索回归应用于每两个探索变量的面板数据集,然后将所有系数结果组合在一个数据帧中,其中每个套索回归由其两个分组变量标识。这是我想按组运行的代码:

library(glmnet)
library(tidyverse)
#using Lasso variable selection 
# Store Independent variables into a Matrix
X <- as.matrix(iris[,c(2:3)])

# Store dependent variable into a vector (in this case Sepal.Length)
y <- iris$Sepal.Length 
# Choose Constrained Coefficients. In this case positive between 0 and Inf
lb <- rep(0,length(colnames( X )))
ub <- rep(Inf,length(colnames( X )))

cv_las1 <- cv.glmnet(x = X,y = y,  
lower.limits = lb,
upper.limits = ub)
lambda <- cv_las1$lambda.min 
# Run glmnet (with min Lambda)
las1 <- glmnet(x = X,y = y,  
lower.limits = lb,
upper.limits = ub,
lambda = lambda) 
# See coefs
c.fit1 <- coef(las1) %>% as.matrix() %>% t() %>% as.data.frame()

现在,我想用c(Petal.Width,Species(运行上面的代码,并在一个面板数据帧中获得所有套索回归结果,其中每个回归由其两个分组变量标识。

通常,我使用dplyr group by,但我不知道如何将其应用于需要指定输入的glmnet包。在我的实际数据集中,我有超过数百万个两个分组值的组合,所以我正在寻找可扩展和基于函数的东西。

以下代码有效,但现在我的问题是将其有效地应用于大型数据集:

lass <- function(z) {
data <- z
X <- as.matrix(data[,c(2:3)])


# Store dependent variable into a vector (in this case Sepal.Length)
y <- data$Sepal.Length 

# Choose Constrained Coefficients. In this case positive between 0 and Inf
lb <- rep(0,length(colnames( X )))
ub <- rep(Inf,length(colnames( X )))


cv_las1 <- cv.glmnet(x = X,y = y,  
lower.limits = lb,
upper.limits = ub)

lambda <- cv_las1$lambda.min 

# Run glmnet (with min Lambda)
las1 <- glmnet(x = X,y = y,  
lower.limits = lb,
upper.limits = ub,
lambda = lambda) 

# See coefs
c.fit1 <- coef(las1) %>% as.matrix() %>% t() %>% as.data.frame()
return(c.fit1)
}

result = iris %>% 
group_by(Species,Petal.Width) %>%
do(rbind(lass(.)))

以下代码有效,但现在我的问题是将其有效地应用于大型数据集:

lass <- function(z) {
data <- z
X <- as.matrix(data[,c(2:3)])


# Store dependent variable into a vector (in this case Sepal.Length)
y <- data$Sepal.Length 

# Choose Constrained Coefficients. In this case positive between 0 and Inf
lb <- rep(0,length(colnames( X )))
ub <- rep(Inf,length(colnames( X )))


cv_las1 <- cv.glmnet(x = X,y = y,  
lower.limits = lb,
upper.limits = ub)

lambda <- cv_las1$lambda.min 

# Run glmnet (with min Lambda)
las1 <- glmnet(x = X,y = y,  
lower.limits = lb,
upper.limits = ub,
lambda = lambda) 

# See coefs
c.fit1 <- coef(las1) %>% as.matrix() %>% t() %>% as.data.frame()
return(c.fit1)
}

result = iris %>% 
group_by(Species,Petal.Width) %>%
do(rbind(lass(.)))

最新更新