r语言 - 如何加速线性规划问题的求解?



我有以下问题。我有n(通常n = 1000)个数据点(来自{1,2,3}的整数,所以有很多重复的数字)和一个实数d。我必须选择k<n个点(k是给定的),使k点的平均值与点d之间的距离最小。这可以表示为一个MILP问题(请参阅这里)。>

我试图在R中使用lpSolve和Rglpk包来解决这个问题,但需要很多时间来解决它(我试图解决n = 100点,代码已经运行了40分钟)。我猜问题是有很多二进制变量(n),也有重复的数字。

library(Rglpk)
set.seed(123)
sampsize <- sample(c(1,2,3),size=100,replace = TRUE)
k <- 50
d <- 86/47
lngth <- length(sampsize)
f.obj <- c(1,rep(0,lngth))
f.con <- matrix(c(0,rep(1,lngth),-1,sampsize/k,1,sampsize/k),nrow=3, byrow = TRUE)
f.dir <- c("==","<=",">=")  
f.rhs <- c(k,d,d)
f.types <- c("C",rep("B",lngth))
res <- Rglpk_solve_LP(obj=f.obj,mat=f.con,dir=f.dir,rhs=f.rhs,max=FALSE,types=f.types)

我将满足于一个次优解。是否有一种方法可以快速解决问题,或者以某种方式重新表达问题以加快算法速度?

我将感谢您对此的任何建议。

CVXR是一个更好的工具:

#
# generate random data
#
set.seed(123)
N <- 100           # sample size
v <- c(1,2,3)      # sample from these unique values
M <- length(v)     # number of unique values
data <- sample(v, size=N, replace=TRUE)
tab <- table(data) # tabulate  
K <- 50            # number of points to choose
target <- 86/47    # target for mean 
#
#  CVXR model
#  see https://cvxr.rbind.io/
#
library(CVXR)
# select a number of values from each bin
select <- Variable(M,integer=T) 
# obj: sum of absolute deviations
objective = Minimize(abs(sum(v*select)/K-target))
# include nonnegativity constraints
constraints = list(sum(select)==K, select >= 0, select <= vec(tab))
problem <- Problem(objective, constraints)
sol <- solve(problem,verbose=T)
cat("n")
cat("Status:",sol$status,"n")
cat("Objective:",sol$value,"n")
cat("Solution:",sol$getValue(select),"n")

输出:

GLPK Simplex Optimizer, v4.65
9 rows, 4 columns, 17 non-zeros
0: obj =   0.000000000e+00 inf =   5.183e+01 (2)
3: obj =   5.702127660e-01 inf =   0.000e+00 (0)
*     4: obj =   1.065814104e-16 inf =   0.000e+00 (0)
OPTIMAL LP SOLUTION FOUND
GLPK Integer Optimizer, v4.65
9 rows, 4 columns, 17 non-zeros
3 integer variables, none of which are binary
Integer optimization begins...
Long-step dual simplex will be used
+     4: mip =     not found yet >=              -inf        (1; 0)
+    55: >>>>>   1.021276596e-02 >=   9.787234043e-03   4.2% (52; 0)
+    56: >>>>>   9.787234043e-03 >=   9.787234043e-03 < 0.1% (16; 36)
+    56: mip =   9.787234043e-03 >=     tree is empty   0.0% (0; 103)
INTEGER OPTIMAL SOLUTION FOUND
Status: optimal 
Objective: 0.009787234 
Solution: 26 7 17 

下面是用python写的,但我认为这个概念很容易传达,如果需要的话,可以在r中重新表述。

基本上:重新制定你的问题。而不是优化二进制"选择"的长向量;变量,你所需要的只是3个变量来表述它,特别是1、2和3的(整数)个数。

这几乎可以立即作为IP解决。

import pyomo.environ as pyo
from random import randint
n = 1000
k = 500
sample = [randint(1, 3) for t in range(n)]
avail = {t : len([val for val in sample if val==t]) for t in range(1, 4)}
target = 86/47
m = pyo.ConcreteModel()
m.vals = pyo.Set(initialize=[1,2,3])
m.pick = pyo.Var(m.vals, domain=pyo.NonNegativeIntegers)
m.delta = pyo.Var()
m.obj = pyo.Objective(expr=m.delta)
# constrain the delta as an absolute value of |sum(picks) - target|
m.C1 = pyo.Constraint(expr=m.delta >= sum(m.pick[v]*v for v in m.vals)-target*k)
m.C2 = pyo.Constraint(expr=m.delta >= -sum(m.pick[v]*v for v in m.vals)+target*k)
# don't use more than available for each value
def limit(m, v):
return m.pick[v] <= avail[v]
m.C3 = pyo.Constraint(m.vals, rule=limit)
soln = pyo.SolverFactory('glpk').solve(m)
print(soln)
m.pick.display()

收益率:

Solver: 
- Status: ok
Termination condition: optimal
Statistics: 
Branch and bound: 
Number of bounded subproblems: 885
Number of created subproblems: 885
Error rc: 0
Time: 0.3580749034881592
Solution: 
- number of solutions: 0
number of solutions displayed: 0
pick : Size=3, Index=vals
Key : Lower : Value : Upper : Fixed : Stale : Domain
1 :     0 :   3.0 :  None : False : False : NonNegativeIntegers
2 :     0 :   0.0 :  None : False : False : NonNegativeIntegers
3 :     0 : 304.0 :  None : False : False : NonNegativeIntegers

意识到你也可以很有效地攻击这个算法,得到一个(相当简单的)接近最优的答案,或者用一些汗水来得到最优的答案。下面是我修改过的一个框架。关键的观察是,你可以"添加更多的3"。直到达到k * target的量可以全部用1填满为止。非常接近于,除非你最好在末尾替换几个2,或者如果你用完了1就备份。

下面运行(在python中),并且在的大部分方法中是一个很好的近似。

### Code:
# average hitting
from random import randint
n = 1000
k = 50
sample = [randint(1, 3) for t in range(n)]
available = {t : len([val for val in sample if val==t]) for t in range(1, 4)}
target = 86/47
print(f'available at start: {available}')

sum_target = target * k
soln = []
selections_remaining = k
togo = sum_target - sum(soln)
for pick in range(k):
if togo > k - pick and available[3] > 0:
soln.append(3)
available[3] -= 1
elif togo > k - pick and available[2] > 0:
soln.append(2)
available[2] -= 1
elif available[1] > 0:
soln.append(1)
available[1] -= 1
else: # ran out of ones in home stretch... do a swap
pass
# some more logic...
togo = sum_target - sum(soln)
print(f'solution: {soln}') 
print(f'generated: {sum(soln)/k} for target of {target}')
print(f'leftover: {available}')

收益率:

available at start: {1: 349, 2: 335, 3: 316}
solution: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
generated: 1.84 for target of 1.8297872340425532
leftover: {1: 291, 2: 335, 3: 274}
[Finished in 117ms]

最新更新