如何使用Statsmodel中Python中的所有变量进行逻辑回归(相当于R glm)



我想用Python进行逻辑回归。

我在R中的参考是

model_1 <- glm(status_1 ~., data = X_train, family=binomial)
summary(model_1)

我正在尝试将其转换为Python。但不太确定如何获取所有变量。

import statsmodels.api as sm
model = sm.formula.glm("status_1 ~ ", family=sm.families.Binomial(), data=train).fit()
print(model.summary())

如何使用所有变量,这意味着在status_1之后我需要输入什么?

statsmodels使逻辑回归变得非常简单,例如:

import statsmodels.api as sm
Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]
log_reg = sm.Logit(ytrain, Xtrain).fit()

其中gmatgpawork_experience是您的自变量。

根据您的问题,我知道您有二项式数据,您想使用logit作为链接函数创建一个广义线性模型。此外,正如您在这个线程中看到的(jseabold的回答(,您提到的特性在patsy中还不存在。因此,我将向您展示如何在使用sm.GLM()函数获得二项式数据时创建广义线性模型。

#Imports
import numpy as np
import pandas as pd
import statsmodels.api as sm
#Suppose that your train data is in a dataframe called data_train
#Let's split the data into dependent and independent variables

在这个阶段,我想提到的是,我们的因变量应该是一个有两列的2d数组,正如对统计模型GLM函数的帮助所建议的那样:

二项式族模型接受具有两列的2d数组。如果提供,则每个观察结果预计为[成功,失败]。

#Let's create the array which holds the dependent variable
y = data_train[["the name of the column of successes","the name of the column of failures"]]
#Let's create the array which holds the independent variables
X = data_train.drop(columns = ["the name of the column of successes","the name of the column of failures"])
#We have to add a constant in the array of the independent variables because by default constants
#aren't included in the model
X = sm.add_constant(X)
#It's time to create our model
logit_model = sm.GLM(
endog = y,
exog = X,
family = sm.families.Binomial(link=sm.families.links.Logit())).fit())
#Let's see some information about our model
logit_model.summary()

最新更新