面板回归给出误差"exog does not have full column rank"



我正在尝试估计面板回归(请参阅:https://bashtage.github.io/linearmodels/doc/panel/examples/examples.html)

我的数据格式是这样的(这只是一个示例片段;在原始文件中有11列加上时间戳和数千行(:

我有什么

Timestamp   Country Dummy   Pre Post    All_Countries   Timestamp
1993-11-01  1               0   1       6.18    1993-11-01
1993-11-02  1               0   1       6.18    1993-11-02
1993-11-03  1               0   1       6.17    1993-11-03
1993-11-04  1               1   0       6.17    1993-11-04
1993-11-15  1               1   0       6.40    1993-11-15
1993-11-01  2               0   1       7.05    1993-11-01
1993-11-02  2               0   1       7.05    1993-11-02
1993-11-03  2               0   1       7.20    1993-11-03
1993-11-04  2               1   0       7.50    1993-11-04
1993-11-15  2               1   0       7.60    1993-11-15
1993-11-01  3               0   1       7.69    1993-11-01
1993-11-02  3               0   1       7.61    1993-11-02
1993-11-03  3               0   1       7.67    1993-11-03
1993-11-04  3               1   0       7.91    1993-11-04
1993-11-15  3               1   0       8.61    1993-11-15

如何重新创建它

import numpy as np
import pandas as pd
df = pd.DataFrame({"Timestamp" : ['1993-11-01' ,'1993-11-02', '1993-11-03', '1993-11-04','1993-11-15'], "Pre" : [0 ,0, 0, 1, 1], "Post" : [1 ,1, 1, 0, 0],  "Austria" : [6.18 ,6.18, 6.17, 6.17, 6.40],"Belgium" : [7.05, 7.05, 7.2, 7.5, 7.6],"France" : [7.69, 7.61, 7.67, 7.91, 8.61]},index = [1, 2, 3,4,5])
df

index_data = df.melt(['Timestamp','Pre','Post'], var_name='Country Dummy', value_name='All_Countries')
index_data['Country Dummy'] = index_data['Country Dummy'].factorize()[0] + 1
# pd.Categorical(out['Country Dummy']).codes + 1
timestamp = pd.Categorical(index_data['Timestamp'])
index_data = index_data.set_index(['Timestamp', 'Country Dummy'])
index_data['Timestamp'] = timestamp
index_data

**我的工作**

!pip install linearmodels
from linearmodels.panel import PooledOLS
import statsmodels.api as sm
exog_vars = ['Pre','Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()
print(pooled_res)

**我得到的**

"ValueError:exog没有完整的列秩">

问题

有人知道是什么导致了这个问题吗?

想法

是因为我的数据应该这样格式化吗(请参阅顶部链接中的示例(:-->如果是的话,我怎么能拿到

Timestamp   Country Dummy   Pre Post    All_Countries   Timestamp
1993-11-01  1               0   1       6.18    1993-11-01
1993-11-02                  0   1       6.18    1993-11-02
1993-11-03                  0   1       6.17    1993-11-03
1993-11-04                  1   0       6.17    1993-11-04
1993-11-15                  1   0       6.40    1993-11-15
1993-11-01  2               0   1       7.05    1993-11-01
1993-11-02                  0   1       7.05    1993-11-02
1993-11-03                  0   1       7.20    1993-11-03
1993-11-04                  1   0       7.50    1993-11-04
1993-11-15                  1   0       7.60    1993-11-15
1993-11-01  3               0   1       7.69    1993-11-01
1993-11-02                  0   1       7.61    1993-11-02
1993-11-03                  0   1       7.67    1993-11-03
1993-11-04                  1   0       7.91    1993-11-04
1993-11-15                  1   0       8.61    1993-11-15

由于PrePost的线性组合,因此引发了该错误。您应该只使用其中一列,因为另一列不添加信息(并破坏了模型背后的代数(。在这种情况下:

Pre = 1 - Post

这与您在运行OLS模型时丢弃一个用作基线的假人的原因相同。

这应该有效:

exog_vars = ['Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()

相关内容

最新更新