将线性回归解决方案组合在一起



https://gist.github.com/marcelcaraciolo/1321585

从这段代码中,我试图找到我目前在numpy数组中拥有的数据集的theta系数。我已经将训练数组保存到一个名为"foo.csv"的csv文件中。我通过使用pandas库从另一个csv文件转换了我的代码,目前我的训练集是10886行乘12列。我的第一列是我的Y或我希望预测的值,所有其他列都是我希望得到θ值的变量。

这应该意味着我最终得到了θ值的12乘1矩阵,因为有12个因变量。

现在我对Python还比较陌生。我目前正在运行iPython,并希望输入我的测试数组的名称,因为我将其保存为一个名为"foo.csv"的csv文件。我希望能够编写[1]MVLR.calctheta(foo.csv),并使输出为12乘1矩阵。但我不明白。我一直得到一个:

AttributeError: 'module' object has no attribute 'calctheta'

但我已经清楚地将calcttheta保存为一个函数,不明白为什么我不能调用它。我是否错误地声明了这个方法?我假设我可以评估θ值,然后运行for循环,以便使用这些θ值和因变量评估每个测试行。

我遇到的问题是,我从上面的github中更改了这个calcttheta函数。我想要它,这样我就可以用的csv文件调用calcttheta

def calctheta(name):
    data = genfromtxt (name, delimiter=",")
    y = data[:,0]
    X = data[:,1:11]

    #number of training samples
    m = y.size
    y.shape = (m, 1)
    #Scale features and set them to zero mean
    x, mean_r, std_r = feature_normalize(X)
    #Add a column of ones to X (interception data)
    it = ones(shape=(m, 12))
    it[:, 1:12] = x
    #Some gradient descent settings
    iterations = 100
    alpha = 0.01
    #Init Theta and Run Gradient Descent
    theta = zeros(shape=(11, 1))
    theta, J_history = gradient_descent(it, y, theta, alpha, iterations)
    print theta
    plot(arange(iterations), J_history)
    xlabel('Iterations')
    ylabel('Cost Function')
    show()

另一方面,对于这个多变量线性回归问题,有许多因变量。我的一些变量是根据0-->有多少选项的排名来确定的。

例如,如果该列选择有3个选项,则会有由训练集确定的分布,但对于其他列,这是原始值,因此平均值只是(例如,这是一个温度列)

我的问题是,在计算θ值时,变量对选项的排序不同这一事实并不能取消使用多变量线性回归的资格。如果我们假设你试图测量的最终事物相对于其输入是正态分布的,我认为这是不可能的。

编辑:

我把它添加到代码的顶部,并用缩进代码的其余部分

class MVLR:

我现在得到

NameError: name 'calctheta' is not defined

编辑2:

我的代码

MVLR:类

from numpy import loadtxt, zeros, ones, array, genfromtxt, linspace, logspace, mean, std, arange
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from pylab import plot, show, xlabel, ylabel
#Evaluate the linear regression
def __init__(self, name):
    self.name = name
def feature_normalize(self.X):
    mean_r = []
    std_r = []
    X_norm = X
    n_c = X.shape[1]
    for i in range(n_c):
        m = mean(X[:, i])
        s = std(X[:, i])
        mean_r.append(m)
        std_r.append(s)
        X_norm[:, i] = (X_norm[:, i] - m) / s
    return X_norm, mean_r, std_r

def compute_cost(self, X, y, theta):
    '''
    Comput cost for linear regression
    '''
    #Number of training samples
    m = y.size
    predictions = X.dot(theta)
    sqErrors = (predictions - y)
    J = (1.0 / (2 * m)) * sqErrors.T.dot(sqErrors)
    return J

def gradient_descent(self, X, y, theta, alpha, num_iters):
    '''
    Performs gradient descent to learn theta
    by taking num_items gradient steps with learning
    rate alpha
    '''
    m = y.size
    J_history = zeros(shape=(num_iters, 1))
    for i in range(num_iters):
        predictions = X.dot(theta)
        theta_size = theta.size
        for it in range(theta_size):
            temp = X[:, it]
            temp.shape = (m, 1)
            errors_x1 = (predictions - y) * temp
            theta[it][0] = theta[it][0] - alpha * (1.0 / m) * errors_x1.sum()
        J_history[i, 0] = compute_cost(X, y, theta)
    return theta, J_history
#Load the dataset

#Plot the data
'''
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
n = 100
for c, m, zl, zh in [('r', 'o', -50, -25)]:
    xs = data[:, 0]
    ys = data[:, 1]
    zs = data[:, 2]
    ax.scatter(xs, ys, zs, c=c, marker=m)
ax.set_xlabel('Size of the House')
ax.set_ylabel('Number of Bedrooms')
ax.set_zlabel('Price of the House')
plt.show()
'''
def calctheta(self, name):
    data = genfromtxt (name, delimiter=",")
    y = data[:,0]
    X = data[:,1:11]

    #number of training samples
    m = y.size
    y.shape = (m, 1)
    #Scale features and set them to zero mean
    x, mean_r, std_r = feature_normalize(X)
    #Add a column of ones to X (interception data)
    it = ones(shape=(m, 12))
    it[:, 1:12] = x
    #Some gradient descent settings
    iterations = 100
    alpha = 0.01
    #Init Theta and Run Gradient Descent
    theta = zeros(shape=(11, 1))
    theta, J_history = gradient_descent(it, y, theta, alpha, iterations)
    print theta
    plot(arange(iterations), J_history)
    xlabel('Iterations')
    ylabel('Cost Function')
    show()

您应该考虑使用类来设计代码。你可以让你的文件看起来像这样(部分代码取自你的问题):

from numpy import loadtxt, zeros, ones, array, genfromtxt, linspace, logspace, mean, std, arange
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from pylab import plot, show, xlabel, ylabel
class MyClass(object):
    def __init__(self, name):
        self.name = name
    def calculate_theta(self, name):
       # code calculating theta here
       return theta
    def feature_normalize(self.X):
        mean_r = []
        std_r = []
        X_norm = X
        n_c = X.shape[1]
        for i in range(n_c):
           m = mean(X[:, i])
           s = std(X[:, i])
           mean_r.append(m)
           std_r.append(s)
           X_norm[:, i] = (X_norm[:, i] - m) / s
        return X_norm, mean_r, std_r
if __name__ == '__main__':
    my_class = MyClass(some_input_x)
    my_class.calculate_theta(some_input_y)

在这里,您可以获得一个关于如何创建类的更好示例。

最新更新