我在做回归分析时注意到一些非常奇怪的事情。本质上,当我独立地估计一个回归和一个purrr::map
函数内的相同回归并提取元素时,我得到这两个对象不相同。我的问题是为什么会这样,或者是否应该这样。
我问这个问题的主要原因是因为一些包在从purrr::map
中提取的估计中提取信息时存在问题,但当我单独估计它们时却没有。下面是一个带有一些无意义回归的小示例:
library(fixest)
library(tidyverse)
## creating a formula for a regression example
formula <- as.formula(paste0(
"mpg", "~",
paste("cyl", collapse = "+"),
paste("|"), paste(c("gear", "carb"), collapse = "+")))
## estimating the regression and saying
mtcars_formula <- feols(formula, cluster = "gear", data = mtcars)
## estimating the same regression twice, but using map
mtcars_list_map <- map(list("gear", "gear"), ~ feols(formula, cluster = ., data = mtcars))
## extracting the first element of the list
is_identical_1 <- mtcars_list_map %>%
pluck(1)
## THESE ARE NOT IDENTIAL
identical(mtcars_formula, is_identical_1)
我也用fixest
包标记这个,只是因为这可能是包特定的…
差异在很大程度上归因于环境的差异。例如,这些列表的第三个元素(即mtcars_formula
和is_identical_1
)是公式mpg~cyl
(实际上mtcars_formula[[3]] == is_identical_1[[3]]
将返回TRUE
)。但是,您将看到它们与不同的环境相关联。
> mtcars_formula[[3]] == is_identical_1[[3]]
[1] TRUE
> environment(mtcars_formula[[3]])
<environment: 0x560a2490ef40>
> environment(is_identical_1[[3]])
<environment: 0x560a2554d810>
你是否认为这些差异"微不足道"?与否取决于您的用例,但是您可以像这样检查差异:
differences =list()
for(i in 1:length(mtcars_formula)) {
if(!identical(mtcars_formula[[i]], is_identical_1[[i]])) {
differences[[names(mtcars_formula)[i]]] = list(mtcars_formula[[i]], is_identical_1[[i]])
}
}
一个确实不同的元素是报告的call
(第四个元素)
> mtcars_formula[[4]] == is_identical_1[[4]]
[1] FALSE
> c(mtcars_formula[[4]], is_identical_1[[4]])
[[1]]
feols(fml = formula, data = mtcars, cluster = "gear")
[[2]]
feols(fml = formula, data = mtcars, cluster = .)
这可能与您在上面的注释中报告的与fwildclusterboot::boottest()
相关的错误有关。注意,使用map()
创建的对象的调用指向cluster=.
,而不是' cluster="gear"
mtcars_list_map <- map(list("gear", "gear"), function(x) {
# create the model
model = feols(formula, cluster = x, data = mtcars)
# manipulate the call object
model$call$cluster=x
# return the model
model
})