OLS Breusch Pagan test in Python



我使用statsmodels包来估计我的OLS回归。现在我想要Breusch Pagan test.我在此测试中使用了pysal包,但此函数返回错误:

import statsmodels.api as sm
import pysal
model = sm.OLS(Y,X,missing = 'drop')
rs = model.fit()
pysal.spreg.diagnostics.breusch_pagan(rs)

返回错误:

属性

错误:"OLSResults"对象没有属性"u">

我该怎么办?

问题是 statsmodels 的回归结果实例与 pysal 中的回归结果实例不兼容。

您可以使用 statsmodels 中的breuschpagan,它采用 OLS 残差和异方差性解释变量的候选项,因此它不依赖于特定模型或模型的实现。

文档: https://www.statsmodels.org/devel/generated/statsmodels.stats.diagnostic.het_breuschpagan.html

这里有例子 https://www.statsmodels.org/devel/examples/notebooks/generated/regression_diagnostics.html

我不知道在实施布鲁施-异教测试方面是否存在任何本质差异。

看起来这个名字在统计模型中拼写错误。

编辑名称的拼写已在统计模型版本 0.9 中更正。 旧的拼写不正确breushpagan

由于我倾向于不使用statsmodels库,因此我创建了一个Python函数来执行Breusch-Pagan测试。它使用来自SciKit-learn的多元线性回归。

import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import chisqprob

def breusch_pagan_test(x, y):
'''
Breusch-Pagan test for heteroskedasticity in a linear regression model:
H_0 = No heteroskedasticity.
H_1 = Heteroskedasticity is present.
Inputs:
x = a numpy.ndarray containing the predictor variables. Shape = (nSamples, nPredictors).
y = a 1D numpy.ndarray containing the response variable. Shape = (nSamples, ).
Outputs a list containing three elements:
1. the Breusch-Pagan test statistic.
2. the p-value for the test.
3. the test result.
'''
if y.ndim != 1:
raise SystemExit('Error: y has more than 1 dimension.')
if x.shape[0] != y.shape[0]:
raise SystemExit('Error: the number of samples differs between x and y.')
else:
n_samples = y.shape[0]
# fit an OLS linear model to y using x:
lm = LinearRegression()
lm.fit(x, y)
# calculate the squared errors:
err = (y - lm.predict(x))**2
# fit an auxiliary regression to the squared errors:
# why?: to estimate the variance in err explained by x
lm.fit(x, err)
pred_err = lm.predict(x)
del lm
# calculate the coefficient of determination:
ss_tot = sum((err - np.mean(err))**2)
ss_res = sum((err - pred_err)**2)
r2 = 1 - (ss_res / ss_tot)
del err, pred_err, ss_res, ss_tot
# calculate the Lagrange multiplier:
LM = n_samples * r2
del r2
# calculate p-value. degrees of freedom = number of predictors.
# this is equivalent to (p - 1) parameter restrictions in Wikipedia entry.
pval = chisqprob(LM, x.shape[1])
if pval < 0.01:
test_result = 'Heteroskedasticity present at 99% CI.'
elif pval < 0.05:
test_result = 'Heteroskedasticity present at 95% CI.'
else:
test_result = 'No significant heteroskedasticity.'
return [LM, pval, test_result]

最新更新