逻辑回归在 Python 中不起作用



我正在尝试使用 scipy.optimize 在 Python 中实现逻辑回归,并得到我在下面描述的错误

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as sci
data=pd.read_csv("data.txt")
X=data.iloc[:,:-1]
y=data.iloc[:,-1]
admitted=data.loc[y==1]
not_admitted=data.loc[y==0]
plotting the data
plt.scatter(admitted.iloc[:,0],admitted.iloc[:,1],color='red',marker='X')
plt.scatter(not_admitted.iloc[:,0],not_admitted.iloc[:,1],color='green',marker='o')
plt.show()
X=np.c_[np.ones((X.shape[0],1)),X]
y=y[:,np.newaxis]
theta=np.zeros((X.shape[1],1))
def sigmoid(x):
return 1/1+np.exp(-x)
def input(theta,x):
return np.dot(x,theta)
def probablity(theta,x):
return sigmoid(input(theta,x))
def cost_func(self,theta,x,y):
m=x.shape[0]
cost=-(1/m)*sum(y*np.log(probablity(theta,x))+(1-y)*np.log(1-probablity(theta,x)))
return cost
def gradient(theta,x,y):
m=x.shape[0]
grad=(1/m)*np.dot(x.T,probablity(theta,x)-y)
def fit(self, x, y, theta):
opt_weights=sci.fmin_tnc(func=cost_func,x0=theta,fprime=gradient,args=(x,y.flatten()))
return opt_weights[0]
parameters = fit(X, y, theta)
print('The value of parameters are '+str(parameters))

我收到错误,

RuntimeWarning: invalid value encountered in log
total_cost = -(1 / m) * np.sum(y * np.log(probability(theta, x)) + (1 - y) * np.log(1 - probability(theta, x)))
NIT   NF   F                       GTG
0    1                    NAN   1.52331587E+04
tnc: fscale = 0.00810224
0   66                    NAN   1.52331587E+04
tnc: Linear search failed

我确实知道不能将日志视为负值,但我从未在八度音阶中出现此错误,有人可以帮忙吗

发生这种情况是因为您定义了没有任何边界的 sigmoid,对于大数字或小数字,您将获得 +INF 和 -INF,这会导致您遇到的问题。它可以称为量化问题,也可以称为 sigmoid 的估计。

正如这里提到的,您可以像这样修改您的 sigmoid 函数(我只是在此处复制该解决方案(,您的问题将得到解决:

def sigmoid(x):
"Numerically-stable sigmoid function."
if x >= 0:
z = np.exp(-x)
return 1 / (1 + z)
else:
z = np.exp(x)
return z / (1 + z)