如何计算公司数据的平均R平方



股票收益

我不知道如何计算个人股票收益和市场收益的平均r平方

import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
df1 = pd.read_csv(r'C:UsersUSERDesktop股價資料.csv') 
pd.DataFrame(df1)
model = LinearRegression()
x1 = df1[['NAME OF COMPANY']]
y1 = df1[['INDIVIDUAL COMPANY RETURN']]
y2= df1[['MARKET RETURN']]
z1=  df1[['YEAR']]
# print(x1,z1)
list1=[]
list2=[]
list3=[]
list4=[]
i=0
for i in range(379539): #total number of company's data is 379539
if x1.values[i]!=x1.values[i+1] or z1.values[i]!=z1.values[i+1]:
list1.append(x1.index[i+1])

y1.values[list1[i]:list1[i+1]]                  
# lm.fit(list3, list4)
# r_squared = lm.score(list3,list4)
# list2.append(r_squared)
# lm.fit(list3, list4)
# r_squared = lm.score(list3,list4)
# # print(r_squared)

这就是我的代码现在的样子,我已经尝试分离y1,y2的数据运行r平方与值附加在list1

R平方公式:1-unexplained_variation/total_variation

Unexplained variation为各数据点最优拟合线预测值与实际值的差值之和。你可以用numpy.polyfit()计算最拟合线的系数。

Total variation为各数据点平均值与实际值的差值之和。

编辑:使用虚拟值,它看起来像这样

import numpy as np
x = [2000, 2001, 2002, 2003, 2004]
y = [50000, 10000, 20000, 30000, 5000]
def get_unexplained_variation(xs, ys):
a,b = np.polyfit(xs, ys, 1)
var = 0
for x,y in zip(xs, ys):
var += (a*x+b - y)**2
return var
def get_total_variation(xs, ys):
avg = np.mean(ys)
var = 0
for y in ys:
var += (y - avg) ** 2
return var
unexplained_variation = get_unexplained_variation(x, y)
total_variation = get_total_variation(x, y)
print(1 - unexplained_variation/total_variation)

最新更新