使用"统计模型"绘制掩蔽值的残差



我正在使用statsmodels.api来计算两个变量之间的OLS拟合的统计参数:

def computeStats(x, y, yName):
'''
Takes as an argument an array, and a string for the array name.
Uses Ordinary Least Squares to compute the statistical parameters for the
array against log(z), and determines the equation for the line of best fit.
Returns the results summary, residuals, statistical parameters in a list, and the 
best fit equation.
'''
#   Mask NaN values in both axes
mask = ~np.isnan(y) & ~np.isnan(x)
#   Compute model parameters
model = sm.OLS(y, sm.add_constant(x), missing= 'drop')
results = model.fit()
residuals = results.resid
#   Compute fit parameters
params = stats.linregress(x[mask], y[mask])
fit = params[0]*x + params[1]
fitEquation = '$(%s)=(%.4g pm %.4g) \times redshift+%.4g$'%(yName,
params[0],  #   slope
params[4],  #   stderr in slope
params[1])  #   y-intercept
return results, residuals, params, fit, fitEquation

函数的第二部分(使用stats.linregress(很好地处理了掩码值,但statsmodels则不然。当我尝试用plt.scatter(x, resids)根据 x 值绘制残差时,维度不匹配:

ValueError: x and y must be the same size

因为有 29007 个 X 值和 11763 个残差(这是通过掩码过程的 Y 值数量(。我尝试将model变量更改为

model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop')

但这没有效果。

如何根据残差匹配的 x 值对残差进行散点图?

嗨@jim421616 由于统计模型丢弃的缺失值很少,因此您应该使用模型的 exog 变量来绘制散点,如下所示。

plt.scatter(model.model.exog[:,1], model.resid)

供参考,一个完整的虚拟示例

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#generate data
x = np.random.rand(1000)
y =np.sin( x*25)+0.1*np.random.rand(1000)
# Make some as NAN
y[np.random.choice(np.arange(1000), size=100)]= np.nan
x[np.random.choice(np.arange(1000), size=80)]= np.nan

# fit model
model = sm.OLS(y, sm.add_constant(x) ,missing='drop').fit()
print model.summary()
# plot 
plt.scatter(model.model.exog[:,1], model.resid)
plt.show()

最新更新