在Python中,是否有一种方法可以使用for循环多次使用不同的X和Y变量执行线性回归?



本质上,我试图对股票的日收益进行简单的线性回归,以找出哪些股票的均值回归程度最高。我的代码将标普500指数的日收益拉入一个数据帧,然后为每个行情器创建一个滞后列。

import yfinance as yf
import numpy as np
import pandas as pd
from datetime import date, timedelta
from sklearn.linear_model import LinearRegression
enddt = date.today()
startdt = end - timedelta(days=90)
symbols = ['MMM', 'AOS', 'ABT']
data = yf.download(" ".join(symbols), start= startdt,end=enddt)
daily_returns = data['Adj Close'].pct_change()
df2 = pd.DataFrame(daily_returns)
for symbol in symbols:
df2[f'{symbol}_lag'] = df2[symbol].shift(1)
df3 = df2.drop(df2.index[[0,1]])
display(df3.head())

我开始进行基本的线性回归:

x = np.array(df3.MMM).reshape((-1,1))
y = np.array(df3.MMM_lag)
model = LinearRegression().fit(x,y)
print(f"R^2: {model.score_}")
print(f"intercept: {model.intercept_}")
print(f"slope: {model.coef_}")

上面的代码可以工作,但理想情况下,我想拉入400多个报价机,并且我不想键入每个回归。

您可以使用for循环来拟合多个线性回归模型,并使用[]pd.DataFrame切片到相关列。

我简化了你的代码,因为它包含了一些没有定义的变量。

import yfinance as yf
import numpy as np
import pandas as pd
from datetime import date, timedelta
from sklearn.linear_model import LinearRegression
enddt = date.today()
startdt = enddt - timedelta(days=90)
symbols = ["MMM", "AOS", "ABT"]
data = yf.download(" ".join(symbols), start=startdt, end=enddt)
# construct feature matrix and target
X = data["Adj Close"].pct_change()
y = X.shift(1)
# drop first two rows
X = X.iloc[2:, :]
y = y.iloc[2:, :]
for symbol in symbols:
X_sym = X[symbol].values.reshape(-1, 1)
y_sym = y[symbol]
model = LinearRegression().fit(X_sym, y_sym)
print(f"R^2: {model.score(X_sym,y_sym)}")
print(f"intercept: {model.intercept_}")
print(f"slope: {model.coef_}")

输出:

[*********************100%***********************]  3 of 3 completed
R^2: 1.291986384543975e-08
intercept: 0.0016325991063324728
slope: [0.00011383]
R^2: 0.0015475148976311637
intercept: 0.00237810444818736
slope: [0.03956798]
R^2: 0.001977249242221757
intercept: 0.0017093384700974568
slope: [0.04447145]

最新更新