最小二乘法 - 带有 $lambda = 0$ 和 OLS 的 LASSO 在 R glmnet 中产生不同的结果 - least squares - LASSO with $lambda = 0$ and OLS produce different results in R glmnet 小贝子编程网

我希望没有惩罚的LASSO（$\lambda=0$）产生与OLS拟合相同（或非常相似）的系数估计。但是，我在 R 中得到不同的系数估计值，将相同的数据（x，y）放入

glmnet(x, y , alpha=1, lambda=0)套索适合，没有处罚和
lm(y ~ x)适合 OLS。

为什么？

你用错了函数。x应该是模型矩阵。不是原始预测变量值。当你这样做时，你会得到完全相同的结果：

x <- rnorm(500)
y <- rnorm(500)
mod1 <- lm(y ~ x) 
xmm <- model.matrix(mod1)
mod2 <- glmnet(xmm, y, alpha=1, lambda=0)
coef(mod1)
coef(mod2)

我遇到了同样的问题，四处询问无济于事，然后我给软件包维护者（Trevor Hastie）发了电子邮件，他给出了答案。当序列高度相关时，会出现此问题。解决方案是降低glmnet()函数调用中的阈值（而不是通过glmnet.control()）。下面的代码使用内置的数据集EuStockMarkets，并应用带有 lambda=0 的 VAR。对于XSMI，OLS系数低于1，默认glmnet系数高于1，相差约0.03，thresh=1e-14的glmnet系数非常接近OLS系数（相差1.8e-7）。

# Use built-in panel data with integrated series
data("EuStockMarkets")
selected_market <- 2
# Take logs for good measure
EuStockMarkets <- log(EuStockMarkets)
# Get dimensions
num_entities <- dim(EuStockMarkets)[2]
num_observations <- dim(EuStockMarkets)[1]
# Build the response with the most recent observations at the top
Y <- as.matrix(EuStockMarkets[num_observations:2, selected_market])
X <- as.matrix(EuStockMarkets[(num_observations - 1):1, ])
# Run OLS, which adds an intercept by default
ols <- lm(Y ~ X)
ols_coef <- coef(ols)
# run glmnet with lambda = 0
fit <- glmnet(y = Y, x = X, lambda = 0)
lasso_coef <- coef(fit)
# run again, but with a stricter threshold
fit_threshold <- glmnet(y = Y, x = X, lambda = 0, thresh = 1e-14)
lasso_threshold_coef <- coef(fit_threshold)
# build a dataframe to compare the two approaches
comparison <- data.frame(ols = ols_coef,
                         lasso = lasso_coef[1:length(lasso_coef)],
                         lasso_threshold = lasso_threshold_coef[1:length(lasso_threshold_coef)]
)
comparison$difference <- comparison$ols - comparison$lasso
comparison$difference_threshold <- comparison$ols - comparison$lasso_threshold
# Show the two values for the autoregressive parameter and their difference
comparison[1 + selected_market, ]

R回报：

           ols    lasso lasso_threshold  difference difference_threshold
XSMI 0.9951249 1.022945       0.9951248 -0.02782045         1.796699e-07

我已经使用Hastie的书的"前列腺"示例数据集运行了下一个代码：

out.lin1 = lm( lpsa ~ . , data=yy ) 
out.lin1$coeff             
out.lin2 = glmnet( as.matrix(yy[ , -9]), yy$lpsa, family="gaussian", lambda=0, standardize=T  ) 
coefficients(out.lin2)

系数的结果相似。当我们使用标准化选项时，glmnet（）返回的系数采用输入变量的原始单位。请检查您使用的是"高斯"族

来自 glmnet 帮助：另请注意，对于"高斯"，glmnet 将 y 标准化为在计算之前具有单位方差其λ序列（然后对结果系数进行非标准化）;如果您想复制-与其他软件进行结果比较/比较，最好提供标准化的Y。

最小二乘法 - 带有 $lambda = 0$ 和 OLS 的 LASSO 在 R glmnet 中产生不同的结果

相关内容

最新更新

热门标签：