回归与包' plm '和' broom '的首次差异



我估计一阶差分的线性回归是直接从教科书中得出的:

  • James H. Stock和Mark W. Watson, Introduction to Econometrics, Pearson,第4版。
  • Christoph hank, Martin Arnold, Alexander Gerber, and Martin Schmelzer, Introduction to Econometrics with R, https://www.econometrics-with-r.org/10-rwpd.html
  • 数据来自package AER, https://cran.r-project.org/web/packages/AER/AER.pdf

我用plm包中的plm()函数估计模型的一阶差,用broom包中的augment()函数提取残差。我得到一个错误信息,怀疑我可能没有正确使用"fd"选项和/或滥用augment()model="pooling"的类似尝试似乎奏效了。感谢帮助!

library(AER)
data(Fatalities)
Fatalities$fatality <- Fatalities$fatal / Fatalities$pop * 10000
library(plm)
library(broom)
plm.pool <- plm(fatality ~ beertax, data=Fatalities, model="pooling")
tidy(plm.pool)  # ok
augment(plm.pool)  # ok

plm.fd <- plm(fatality ~ beertax, data=Fatalities, 
index=c("state", "year"), 
model="fd")
tidy(plm.fd)  # looks ok
# A tibble: 2 × 5
term        estimate std.error statistic p.value
<chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept) -0.00314    0.0119   -0.263    0.792
2 beertax      0.0137     0.285     0.0480   0.962

augment(plm.fd)  # not ok
Error in `$<-.data.frame`(`*tmp*`, ".resid", value = c(`2` = 0.219840293582125,  : 
replacement has 288 rows, data has 336
In addition: Warning message:
In get(.Generic)(e1, e2) :
longer object length is not a multiple of shorter object length

EDIT: A WORKAROUND

所以我怀疑这个问题与plm返回的模型和残差不具有相同的行数有关:

length(row.names(plm.fd$model))is 336

length(names(plm.fd$residuals))is 288.

有人能告诉我,如果以下是正确的方法来获得残差和拟合值从一差估计?

data.frame(".rownames" = row.names(plm.fd$model), plm.fd$model) %>%
left_join(data.frame(".rownames" = names(resid(plm.fd)), 
".fitted" = fitted(plm.fd),
".resid" = resid(plm.fd)
)) -> Fatalities.augmented
head(Fatalities.augmented)
.rownames fatality  beertax       .fitted       .resid
1         1  2.12836 1.539379            NA           NA
2         2  2.34848 1.788991  0.0034166261  0.219840294
3         3  2.33643 1.714286 -0.0010225479 -0.007890716
4         4  2.19348 1.652542 -0.0008451287 -0.138968054
5         5  2.66914 1.609907 -0.0005835833  0.479380363
6         6  2.71859 1.560000 -0.0006831177  0.053269973

引用:

  • https://cran.r-project.org/web/packages/plm/plm.pdf
  • https://cran.r-project.org/web/packages/broom/broom.pdf

编辑参考:

  • 使用broom::augment Panel数据模型

这是由于broom::augment_columns中的一阶差分(FD)面板模型的误解或非特殊大小写造成的:该函数假设FD模型的残差与预测值具有相同的长度。

更具体地说,这行:ret$.resid <- residuals0(x)(https://github.com/tidymodels/broom/blob/069c21e903174fcf5d491091b7c347a9fdcd2999/R/utilities.R#L256)

FD模型压缩了数据,因此残差的数量少于用于模型估计的观测值的数量。您可以在summary输出中看到:

summary(panel3) # FD model
Oneway (individual) effect First-Difference Model
[...]
Balanced Panel: n = 90, T = 7, N = 630
Observations used in estimation: 540
[...]

模型输入630个观测值,FD变换后,每组(单个维度)损失1个观测值,仅使用540个变换后的观测值->630 - 90 = 540.

broom:augment_columns想把预测值(630)和残差(540)放在同一个数据帧中,这是注定要失败的。如果他们想这样做,他们可以用NA填充值(例如,每个个体的第一行设置为NA)。

我的建议是让开发人员/扫帚的维护者意识到这个问题(也许这篇文章)。plm的FD面板模型通过plm_object$args$model == "fd"进行识别。

最新更新