滚动回归在panda中的一个简单应用



考虑这个简单的示例

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9,1,3,5,7,4,5,6,4,7,8,9],
'b':[3,5,6,2,4,6,2,5,7,1,9,5,3,2,5,4,3,6,4,1,1,9]})

我正在尝试对b执行a的滚动回归。我正在尝试使用最简单的panda工具:apply。我想使用apply,因为我想保持返回回归的任何参数的灵活性。

然而,下面的简单代码不适用于

df.rolling(10).apply(lambda x: smf.ols('a ~ b', data = x).fit())
File "<string>", line 1, in <module>
PatsyError: Error evaluating factor: NameError: name 'b' is not defined
a ~ b
^

问题出在哪里?谢谢

rolling apply不能同时与多个列交互,也不能生成非数值。相反,我们需要利用rolling对象的可迭代性。我们还需要考虑自己处理min_periods,因为无论其他rolling参数如何,可迭代滚动对象都会生成所有窗口结果。

然后,我们可以创建一些函数来生成回归结果中的每一行,以执行以下操作:

def process(x):
if len(x) >= 10:
reg = smf.ols('a ~ b', data=x).fit()
print(reg.params)
return [
# b from params
reg.params['b'],
# b from tvalues
reg.tvalues['b'],
# Both lower and upper b from conf_int()
*reg.conf_int().loc['b', :].tolist()
]
# Return NaN in the same dimension as the results
return [np.nan] * 4

df = df.join(
# join new DataFrame back to original
pd.DataFrame(
(process(x) for x in df.rolling(10)),
columns=['coef', 't', 'lower', 'upper']
)
)

df:

a  b      coef         t     lower     upper
0   1  3       NaN       NaN       NaN       NaN
1   3  5       NaN       NaN       NaN       NaN
2   5  6       NaN       NaN       NaN       NaN
3   7  2       NaN       NaN       NaN       NaN
4   4  4       NaN       NaN       NaN       NaN
5   5  6       NaN       NaN       NaN       NaN
6   6  2       NaN       NaN       NaN       NaN
7   4  5       NaN       NaN       NaN       NaN
8   7  7       NaN       NaN       NaN       NaN
9   8  1 -0.216802 -0.602168 -1.047047  0.613442
10  9  9  0.042781  0.156592 -0.587217  0.672778
11  1  5  0.032086  0.097763 -0.724742  0.788913
12  3  3  0.113475  0.329006 -0.681872  0.908822
13  5  2  0.198582  0.600297 -0.564258  0.961421
14  7  5  0.203540  0.611002 -0.564646  0.971726
15  4  4  0.236599  0.686744 -0.557872  1.031069
16  5  3  0.293651  0.835945 -0.516403  1.103704
17  6  6  0.314286  0.936382 -0.459698  1.088269
18  4  4  0.276316  0.760812 -0.561191  1.113823
19  7  1  0.346491  1.028220 -0.430590  1.123572
20  8  1 -0.492424 -1.234601 -1.412181  0.427332
21  9  9  0.235075  0.879433 -0.381326  0.851476

设置:

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
df = pd.DataFrame({
'a': [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9, 1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9],
'b': [3, 5, 6, 2, 4, 6, 2, 5, 7, 1, 9, 5, 3, 2, 5, 4, 3, 6, 4, 1, 1, 9]
})

Rolling.apply将滚动操作分别应用于每列(相关问题(。

根据用户3226167对此线程的回答,似乎最简单的方法是使用statsmodels.regression.rolling.中的RollingOLS.from_formula来完成您想要的任务

from statsmodels.regression.rolling import RollingOLS
df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9,1,3,5,7,4,5,6,4,7,8,9],
'b':[3,5,6,2,4,6,2,5,7,1,9,5,3,2,5,4,3,6,4,1,1,9]})
model = RollingOLS.from_formula('a ~ b', data = df, window=10)
reg_obj = model.fit()
# estimated coefficient
b_coeff = reg_obj.params['b'].rename('coef')
# b t-value 
b_t_val = reg_obj.tvalues['b'].rename('t')
# 95 % confidence interval of b
b_conf_int = reg_obj.conf_int(cols=[1]).droplevel(level=0, axis=1)
# join all the desired information to the original df
df = df.join([b_coeff, b_t_val, b_conf_int])

其中reg_obj是一个RollingRegressionResults,它保存了许多关于回归的信息(请参阅文档中的所有不同属性(

输出

>>> type(reg_obj)
<class 'statsmodels.regression.rolling.RollingRegressionResults'>
>>> df
a  b      coef         t     lower     upper
0   1  3       NaN       NaN       NaN       NaN
1   3  5       NaN       NaN       NaN       NaN
2   5  6       NaN       NaN       NaN       NaN
3   7  2       NaN       NaN       NaN       NaN
4   4  4       NaN       NaN       NaN       NaN
5   5  6       NaN       NaN       NaN       NaN
6   6  2       NaN       NaN       NaN       NaN
7   4  5       NaN       NaN       NaN       NaN
8   7  7       NaN       NaN       NaN       NaN
9   8  1 -0.216802 -0.602168 -0.922460  0.488856
10  9  9  0.042781  0.156592 -0.492679  0.578240
11  1  5  0.032086  0.097763 -0.611172  0.675343
12  3  3  0.113475  0.329006 -0.562521  0.789472
13  5  2  0.198582  0.600297 -0.449786  0.846949
14  7  5  0.203540  0.611002 -0.449372  0.856452
15  4  4  0.236599  0.686744 -0.438653  0.911851
16  5  3  0.293651  0.835945 -0.394846  0.982147
17  6  6  0.314286  0.936382 -0.343553  0.972125
18  4  4  0.276316  0.760812 -0.435514  0.988146
19  7  1  0.346491  1.028220 -0.313981  1.006963
20  8  1 -0.492424 -1.234601 -1.274162  0.289313
21  9  9  0.235075  0.879433 -0.288829  0.758978

最新更新