我正在尝试估计面板回归(请参阅:https://bashtage.github.io/linearmodels/doc/panel/examples/examples.html)
我的数据格式是这样的(这只是一个示例片段;在原始文件中有11列加上时间戳和数千行(:
我有什么
Timestamp Country Dummy Pre Post All_Countries Timestamp
1993-11-01 1 0 1 6.18 1993-11-01
1993-11-02 1 0 1 6.18 1993-11-02
1993-11-03 1 0 1 6.17 1993-11-03
1993-11-04 1 1 0 6.17 1993-11-04
1993-11-15 1 1 0 6.40 1993-11-15
1993-11-01 2 0 1 7.05 1993-11-01
1993-11-02 2 0 1 7.05 1993-11-02
1993-11-03 2 0 1 7.20 1993-11-03
1993-11-04 2 1 0 7.50 1993-11-04
1993-11-15 2 1 0 7.60 1993-11-15
1993-11-01 3 0 1 7.69 1993-11-01
1993-11-02 3 0 1 7.61 1993-11-02
1993-11-03 3 0 1 7.67 1993-11-03
1993-11-04 3 1 0 7.91 1993-11-04
1993-11-15 3 1 0 8.61 1993-11-15
如何重新创建它
import numpy as np
import pandas as pd
df = pd.DataFrame({"Timestamp" : ['1993-11-01' ,'1993-11-02', '1993-11-03', '1993-11-04','1993-11-15'], "Pre" : [0 ,0, 0, 1, 1], "Post" : [1 ,1, 1, 0, 0], "Austria" : [6.18 ,6.18, 6.17, 6.17, 6.40],"Belgium" : [7.05, 7.05, 7.2, 7.5, 7.6],"France" : [7.69, 7.61, 7.67, 7.91, 8.61]},index = [1, 2, 3,4,5])
df
index_data = df.melt(['Timestamp','Pre','Post'], var_name='Country Dummy', value_name='All_Countries')
index_data['Country Dummy'] = index_data['Country Dummy'].factorize()[0] + 1
# pd.Categorical(out['Country Dummy']).codes + 1
timestamp = pd.Categorical(index_data['Timestamp'])
index_data = index_data.set_index(['Timestamp', 'Country Dummy'])
index_data['Timestamp'] = timestamp
index_data
**我的工作**
!pip install linearmodels
from linearmodels.panel import PooledOLS
import statsmodels.api as sm
exog_vars = ['Pre','Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()
print(pooled_res)
**我得到的**
"ValueError:exog没有完整的列秩">
问题
有人知道是什么导致了这个问题吗?
想法
是因为我的数据应该这样格式化吗(请参阅顶部链接中的示例(:-->如果是的话,我怎么能拿到
Timestamp Country Dummy Pre Post All_Countries Timestamp
1993-11-01 1 0 1 6.18 1993-11-01
1993-11-02 0 1 6.18 1993-11-02
1993-11-03 0 1 6.17 1993-11-03
1993-11-04 1 0 6.17 1993-11-04
1993-11-15 1 0 6.40 1993-11-15
1993-11-01 2 0 1 7.05 1993-11-01
1993-11-02 0 1 7.05 1993-11-02
1993-11-03 0 1 7.20 1993-11-03
1993-11-04 1 0 7.50 1993-11-04
1993-11-15 1 0 7.60 1993-11-15
1993-11-01 3 0 1 7.69 1993-11-01
1993-11-02 0 1 7.61 1993-11-02
1993-11-03 0 1 7.67 1993-11-03
1993-11-04 1 0 7.91 1993-11-04
1993-11-15 1 0 8.61 1993-11-15
由于Pre
是Post
的线性组合,因此引发了该错误。您应该只使用其中一列,因为另一列不添加信息(并破坏了模型背后的代数(。在这种情况下:
Pre = 1 - Post
这与您在运行OLS模型时丢弃一个用作基线的假人的原因相同。
这应该有效:
exog_vars = ['Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()