我正试图重现Stata的结果。数据集是一个不平衡的面板,看起来像
ï..region id year grpmlnr grppc cpi
1 region1 1 1998 18245.5 12242.8 167.7
2 region1 1 1999 32060.6 21398.0 140.8
3 region1 1 2000 42074.5 27969.5 120.9
Stata中的原始回归被合并为reg y x1 x2 x3 x4
形式的OLS,并给出以下输出
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
x1 | -.0045519 .0070808 -0.64 0.520 -.0184413 .0093376
x2 | -.1598071 .0345597 -4.62 0.000 -.2275982 -.092016
x3 | 4.08e-06 4.16e-06 0.98 0.327 -4.08e-06 .0000122
x4 | -.0000874 .0000244 -3.58 0.000 -.0001354 -.0000395
_cons | .2899655 .0655542 4.42 0.000 .1613767 .4185542
Number of obs = 1489, R=squared = 0.0242, Adj R-squared = 0.0216
当我运行时
pooledols<-plm(y~
x1
+ x2
+ x3
+ x4,
data=dataset, index=c('ï..region', 'year'), model='pooling')
summary(pooledols)
我得到
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 1.1228e-02 6.3812e-02 0.1760 0.8603497
x1 3.5982e-03 6.7284e-03 0.5348 0.5928858
x2 4.3466e-02 3.1060e-03 13.9943 < 2.2e-16 ***
x3 1.3737e-05 3.9212e-06 3.5033 0.0004732 ***
x4 -2.7368e-05 2.3573e-05 -1.1610 0.2458259
带有
number of obs = 1489, R=squared = 0.12554, and Adj R-squared = 0.12319.
有人有什么建议吗?我相信这两种情况下的数据集是相同的。我在其他地方看到它暗示,对于随机效应模型,Stata和R如何处理不平衡面板很重要,但我不确定这是否与此相关。
编辑:这是我的数据子集,其中x1, x2, x3, x4
与回归中使用的变量匹配:
region year x1 x2 x3 x4 y
RegionA 1998 9.412693 7.316763 655 212
RegionA 1999 9.412693 4.662889 720 232 0.55836
RegionA 2000 9.412693 3.669467 741 303 0.267817
RegionA 2001 9.412693 3.480852 748 304 0.169225
RegionA 2002 9.412693 3.434518 720 347 0.221187
RegionA 2003 9.412693 3.252523 719 393 0.195911
RegionA 2004 9.412693 2.30941 731 426 0.408409
RegionA 2005 9.412693 2.03653 714 477 0.237577
RegionA 2006 9.412693 1.857329 752 512 0.209052
RegionA 2007 9.412693 1.796764 735 527 0.278823
RegionA 2008 9.412693 1.59614 759 543 0.288872
RegionA 2009 9.412693 1.925464 793 522 -0.04663
RegionA 2010 9.412693 1.685813 779 508 0.267205
RegionA 2011 9.412693 1.570235 767 478 0.241406
RegionA 2012 9.412693 1.689142 787 446 0.068759
RegionA 2013 9.412693 1.819899 810 420 0.03955
RegionA 2014 9.412693 1.859676 814 382 0.083057
RegionA 2015 9.412693 1.860045 806 342 0.11043
RegionA 2016 9.412693 1.921366 822 326 0.048621
RegionA 2017 9.412693 1.911606 823 316 0.074802
RegionB 1998 8.94365 10.81936 633 129
RegionB 1999 8.94365 7.110605 698 152 0.428163
RegionB 2000 8.94365 5.014219 665 192 0.393189
RegionB 2001 8.94365 4.521011 652 208 0.21136
RegionB 2002 8.94365 4.237961 636 276 0.227971
RegionB 2003 8.94365 4.373059 651 301 0.167702
RegionB 2004 8.94365 3.992342 659 320 0.165888
RegionB 2005 8.94365 3.276585 648 345 0.280323
RegionB 2006 8.94365 2.853214 660 392 0.219669
RegionB 2007 8.94365 3.031803 661 401 0.233179
RegionB 2008 8.94365 2.598884 656 457 0.210191
RegionB 2009 8.94365 2.773871 638 472 0.011586
RegionB 2010 8.94365 2.618205 650 443 0.157882
RegionB 2011 8.94365 2.474298 644 410 0.178349
RegionB 2012 8.94365 2.257853 644 387 0.182941
RegionB 2013 8.94365 2.362653 638 336 0.06543
RegionB 2014 8.94365 2.35502 635 320 0.108892
RegionB 2015 8.94365 2.308449 624 282 0.119917
RegionB 2016 8.94365 2.607521 625 252 0.038878
RegionB 2017 8.94365 2.583059 612 223 0.096383
RegionC 1998 9.143153 7.710033 771 120
RegionC 1999 9.143153 4.82562 810 139 0.50267
RegionC 2000 9.143153 4.112946 798 184 0.309938
RegionC 2001 9.143153 3.384044 785 181 0.254107
RegionC 2002 9.143153 3.639285 808 280 0.192077
RegionC 2003 9.143153 3.58782 796 302 0.214723
RegionC 2004 9.143153 2.960462 806 319 0.190094
RegionC 2005 9.143153 2.528599 809 361 0.165926
RegionC 2006 9.143153 2.252368 792 393 0.26823
编辑2:这是第一次回归的结果,与Nick Cox的相同
lm(formula = y ~ x1 + x2 + x3 + x4, data = replicate)
Residuals:
Min 1Q Median 3Q Max
-0.23488 -0.06966 0.00142 0.05492 0.20161
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.792e+00 8.772e-01 -2.043 0.0475 *
x1 1.865e-01 1.149e-01 1.623 0.1122
x2 8.823e-02 1.989e-02 4.437 6.72e-05 ***
x3 -6.175e-05 3.271e-04 -0.189 0.8512
x4 1.995e-04 2.242e-04 0.890 0.3786`
发布了49个观察结果(行(;3具有CCD_ 4的缺失值。这是Stata的一个简单回归,没有任何关注面板结构(更不用说任何时间变量(。_cons
是估计的截距。我还列出了自动排除的3个观察结果。其他人可能想要发布R结果。
. regress y x1 x2 x3 x4
Source | SS df MS Number of obs = 46
-------------+---------------------------------- F(4, 41) = 7.41
Model | .287208881 4 .07180222 Prob > F = 0.0001
Residual | .397187973 41 .009687512 R-squared = 0.4197
-------------+---------------------------------- Adj R-squared = 0.3630
Total | .684396854 45 .015208819 Root MSE = .09843
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .1864634 .1148715 1.62 0.112 -.0455244 .4184511
x2 | .0882264 .0198852 4.44 0.000 .0480674 .1283854
x3 | -.0000618 .0003271 -0.19 0.851 -.0007223 .0005988
x4 | .0001995 .0002242 0.89 0.379 -.0002532 .0006522
_cons | -1.791928 .8772392 -2.04 0.048 -3.563548 -.0203073
------------------------------------------------------------------------------
. l if !e(sample)
+-----------------------------------------------+
| region x1 x2 x3 x4 y |
|-----------------------------------------------|
1. | RegionA 9.412693 7.316763 655 212 . |
21. | RegionB 8.94365 10.81936 633 129 . |
41. | RegionC 9.143153 7.710033 771 120 . |
+-----------------------------------------------+