Python:多元回归函数

我有以下数据帧：

import pandas as pd
from sklearn import linear_model
import statsmodels.api as sm
Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]        
}
df = pd.DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])

目前，我能够执行"Interest_Rate"one_answers"使用以下函数在"Stock_Index_Price"上显示"Unemployment_Rate"：

def perform_regression_multiple(y, x1, x2=""):
test = df[[y, x1, x2]].reset_index(drop=True)

X = test[[x1, x2]]
Y = test[[y]]

regr = linear_model.LinearRegression()
regr.fit(X, Y)
model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 
print_model = model.summary()
print(print_model)

#===========================================================================
perform_regression_multiple('Stock_Index_Price', 'Interest_Rate', 'Unemployment_Rate')

然而，当我尝试使用上述函数执行线性回归(例如，使用"Interest_Rate"作为唯一的解释变量(时，我会收到以下错误消息：

perform_regression_multiple('Stock_Index_Price', 'Interest_Rate')

键错误："["]不在索引中">

显然，x1和x2都需要指定；否则就不会起作用。我应该如何修改函数以允许我指定解释变量的数量？目标是通过其他因素来扩展回归模型。

非常感谢您的建议！

看看如何定义函数：

def perform_regression_multiple(y, x1, x2=""):

然后你怎么称呼它：

perform_regression_multiple('Stock_Index_Price', 'Interest_Rate')

通过该调用，您将告诉函数y="Stock Index Price"、x1="Interest Rate"和x2=""，这是默认值。

在函数的第一行，您使用x2列：

test = df[[y, x1, x2]].reset_index(drop=True)

你将其定义为"&"；，错误是说名称为"不存在。

如果你想用一个或两个变量进行回归，那么就这样做：

def perform_regression_multiple(y, x1, x2=None):
if x2:
test = df[[y, x1, x2]].reset_index(drop=True)

X = test[[x1, x2]]
else:
test = df[[y, x1]].reset_index(drop=True)

X = test[[x1]]
Y = test[[y]]

regr = linear_model.LinearRegression()
regr.fit(X, Y)
model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 
print_model = model.summary()
print(print_model)

您也可以保留空字符串，if仍将以相同的方式工作。

更好的是，考虑到在panda中选择列和返回数据帧时必须传递列表，您可以这样做，将列表传递给x_variables参数(即使它只是一个项目的列表(：

def perform_regression_multiple(y: str, x_variables: list):
columns = [y] + x_variables
test = df[columns].reset_index(drop=True)

X = test[x_variables]
Y = test[[y]]

regr = linear_model.LinearRegression()
regr.fit(X, Y)

model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 

print_model = model.summary()
print(print_model)

相关内容

最新更新

热门标签：