成本函数和梯度似乎有效,但 scipy.optimize 函数不起作用



我正在为Andrew NG Coursera课程编写Matlab代码并将其转换为python。我正在研究非正则化逻辑回归,在编写了梯度和成本函数之后,我需要类似于 fminunc 的东西,经过一些谷歌搜索,我找到了几个选项。它们都返回相同的结果,但它们与 Andrew NG 的预期结果代码中的内容不匹配。其他人似乎正在使其正常工作,但我想知道为什么我的特定代码在使用 scipy.optimize 函数时似乎没有返回所需的结果,而是针对代码前面的成本和梯度部分。

我正在使用的数据可以在下面的链接中找到;

ex2data1

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op

#Machine Learning Online Class - Exercise 2: Logistic Regression
#Load Data
#The first two columns contains the exam scores and the third column contains the label.
data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)

#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]

#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the 
#the problem we are working with.
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()

def sigmoid(z):
'''
SIGMOID Compute sigmoid function
g = SIGMOID(z) computes the sigmoid of z.
Instructions: Compute the sigmoid of each value of z (z can be a matrix,
vector or scalar).
'''
g = 1 / (1 + np.exp(-z))
return g

def costFunction(theta, X, y):
'''
COSTFUNCTION Compute cost and gradient for logistic regression
J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
parameter for logistic regression and the gradient of the cost
w.r.t. to the parameters.
'''
m = len(y) #number of training examples
h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))
#h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
#then we sum the values by rows
#cost function for logisitic regression
return J
def gradient(theta, X, y):
m = len(y)
grad = np.zeros((theta.shape))
h = sigmoid(X.dot(theta))
for i in range(len(theta)): #number of rows in theta
XT = X[:,i]
XT.shape = (len(X),1)
grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
return grad

#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m

#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))

#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))

#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)
print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):n -0.1000n -12.0092n -11.2628')

#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)
print('nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):n 0.043n 2.566n 2.647n')

result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]

Result = op.minimize(fun = costFunction, 
x0 = initial_theta, 
args = (X, y),
method = 'TNC',
jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})

theta = Result.x
theta
test = np.array([[1, 45, 85]]) 
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002n')

这是一个非常难以调试的问题,并且说明了scipy.optimize接口的一个文档记录不佳的方面。文档模糊地表明theta将作为向量传递:

最小化一个或多个变量的标量函数。

通常,优化问题的形式如下:

minimize f(x) subject to
g_i(x) >= 0,  i = 1,...,m
h_j(x)  = 0,  j = 1,...,p 

其中 x 是一个或多个变量的向量。

重要的是,它们实际上意味着最原始意义上的向量,一个一维数组。因此,您必须期望每当theta传递到您的一个回调中时,它都会作为一维数组传入。但是在numpy中,一维数组有时与二维行数组(显然,与二维列数组(的行为不同。

我不知道为什么它会在您的情况下引起问题,但无论如何都很容易解决。您只需要在成本函数和梯度函数的顶部添加以下内容:

theta = theta.reshape(-1, 1)                                           

这保证了theta将按预期为二维列数组。完成此操作后,结果是正确的。

我在 Scipy 处理与您相同的问题时遇到了类似的问题。正如 senderle 指出的那样,接口并不是最容易处理的,尤其是与 numpy 数组接口结合使用......这是我的实现,它按预期工作。

定义成本和梯度函数

请注意,initial_theta作为形状 (3,( 的简单数组传递,并在函数中转换为形状 (3,1( 的列向量。然后,梯度函数再次返回形状为 (3,( 的 grad.ravel((。这很重要,因为否则会导致 Scipy.optimize 中的各种优化方法出现错误消息。

请注意,不同的方法有不同的行为,但返回 .ravel(( 似乎可以解决大多数问题......

import pandas as pd
import numpy as np
import scipy.optimize as opt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def CostFunc(theta,X,y):
#Initializing variables
m = len(y)
J = 0
grad = np.zeros(theta.shape)
#Vectorized computations
z = X @ theta
h = sigmoid(z)
J = (1/m) * ( (-y.T @ np.log(h)) - (1 - y).T @ np.log(1-h));
return J
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis]
grad = np.zeros(theta.shape)
#Vectorized computations
z = X @ theta
h = sigmoid(z)
grad = (1/m)*(X.T @ ( h - y));
return grad.ravel() #<-- This is the trick

初始化变量和参数

Note that initial_theta.shape回报 (3,(

X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))

调用 Scipy.optimize

model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)

欢迎更多知识渊博的人发表任何评论,这个 Scipy 界面对我来说是一个谜,谢谢

最新更新