基于Gekko的约束多元线性回归



我有一个多线性回归问题,其中我有关于输出范围(因变量y)的先验信息-预测必须始终位于该范围内。

我想找到每个特征(自变量)的系数(上界和下界),以便使线性回归模型限制在期望的输出范围内。

例如,我想在sklearn. datassets上应用这个解决方案。load_boston,其中我知道房子的价格将在[10,40]的范围内(y的实际最小值和最大值是[5,50])。

在下面的例子中,我的目标是10在此基础上,我想找出所有系数

的最小和最大边界
from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
from gekko import GEKKO
# Loading the dataset
x, y = load_boston(return_X_y=True)
c  = m.Array(m.FV, x.shape[1]+1) #array of parameters and intercept
for ci in c:
ci.STATUS = 1 #calculate fixed parameter
ci.LOWER  = 1 #constraint: lower limit
ci.UPPER  = 0 #constraint: lower limit
#Define variables
xd = m.Array(m.Param,x.shape[1])
for i in range(x.shape[1]):
xd[i].value = x[i]
yd = m.Param(y); yp = m.Var()
#Equation of Linear Functions
y_pred = c[0]*xd[0]+
c[1]*xd[1]+
c[2]*xd[2]+
c[3]*xd[3]+
c[4]*xd[4]+
c[5]*xd[5]+
c[6]*xd[6]+
c[7]*xd[7]+
c[8]*xd[8]+
c[9]*xd[9]+
c[10]*xd[10]+
c[11]*xd[11]+
c[12]*xd[12]+
c[13]
#Inequality Constraints
yp = m.Var(lb=5,ub=40)
m.Equation(yp==y_pred)
#Minimize difference between actual and predicted y
m.Minimize((yd-yp)**2)
#Solve
m.solve(disp=True)
#Retrieve parameter values
a = [i.value[0] for i in c]
print(a) 

列出的代码要么给我一个未找到错误的解决方案。你能指出我在这里做错了什么吗,或者我遗漏了什么重要的东西吗?同样,我如何得到所有自变量c的上界和下界?

尝试IMODE=2为回归模式。有一些修改,如x[:,i]加载数据,ci.LOWER=0作为下界,ci.UPPER=1作为上界。

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
from gekko import GEKKO
m = GEKKO(remote=False)
# Loading the dataset
x, y = load_boston(return_X_y=True)
n = x.shape[1]
c  = m.Array(m.FV, n+1) #array of parameters and intercept
for ci in c:
ci.STATUS = 1 #calculate fixed parameter
ci.LOWER  = 0 #constraint: lower limit
ci.UPPER  = 1 #constraint: lower limit
#Load data
xd = m.Array(m.Param,n)
for i in range(n):
xd[i].value = x[:,i]
yd = m.Param(y)
#Equation of Linear Functions
yp = m.Var(lb=5,ub=40)
m.Equation(yp==m.sum([c[i]*xd[i] 
for i in range(n)]) + c[13])
#Minimize difference between actual and predicted y
m.Minimize((yd-yp)**2)
#Regression mode
m.options.IMODE=2
#APOPT solver
m.options.SOLVER = 1
#Solve
m.solve(disp=True)
#Retrieve parameter values
a = [i.value[0] for i in c]
print(a) 

使用IPOPT或APOPT(略快)获得了这个受限问题的解。

Number of state variables:    7604
Number of total equations: -  7590
Number of slack variables: -  0
---------------------------------------
Degrees of freedom       :    14

----------------------------------------------
Model Parameter Estimation with APOPT Solver
----------------------------------------------

Iter    Objective  Convergence
0  5.60363E+04  1.25000E+00
1  1.87753E+05  6.66134E-16
2  3.06630E+04  9.99201E-16
3  3.06630E+04  3.84187E-16
5  3.06630E+04  1.35308E-16
Successful solution

---------------------------------------------------
Solver         :  APOPT (v1.0)
Solution time  :  0.48429999999999995 sec
Objective      :  30662.976306748842
Successful solution
---------------------------------------------------

[0.0, 0.11336851243, 0.0, 1.0, 0.43491543768, 
1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.037613878067,
0.0, 1.0]

相关内容

  • 没有找到相关文章

最新更新