如何使用Statsmodel中Python中的所有变量进行逻辑回归(相当于R glm)

我想用Python进行逻辑回归。

我在R中的参考是

model_1 <- glm(status_1 ~., data = X_train, family=binomial)
summary(model_1)

我正在尝试将其转换为Python。但不太确定如何获取所有变量。

import statsmodels.api as sm
model = sm.formula.glm("status_1 ~ ", family=sm.families.Binomial(), data=train).fit()
print(model.summary())

如何使用所有变量，这意味着在status_1之后我需要输入什么？

statsmodels使逻辑回归变得非常简单，例如：

import statsmodels.api as sm
Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]
log_reg = sm.Logit(ytrain, Xtrain).fit()

其中gmat、gpa和work_experience是您的自变量。

根据您的问题，我知道您有二项式数据，您想使用logit作为链接函数创建一个广义线性模型。此外，正如您在这个线程中看到的(jseabold的回答(，您提到的特性在patsy中还不存在。因此，我将向您展示如何在使用sm.GLM()函数获得二项式数据时创建广义线性模型。

#Imports
import numpy as np
import pandas as pd
import statsmodels.api as sm
#Suppose that your train data is in a dataframe called data_train
#Let's split the data into dependent and independent variables

在这个阶段，我想提到的是，我们的因变量应该是一个有两列的2d数组，正如对统计模型GLM函数的帮助所建议的那样：

二项式族模型接受具有两列的2d数组。如果提供，则每个观察结果预计为[成功，失败]。

#Let's create the array which holds the dependent variable
y = data_train[["the name of the column of successes","the name of the column of failures"]]
#Let's create the array which holds the independent variables
X = data_train.drop(columns = ["the name of the column of successes","the name of the column of failures"])
#We have to add a constant in the array of the independent variables because by default constants
#aren't included in the model
X = sm.add_constant(X)
#It's time to create our model
logit_model = sm.GLM(
endog = y,
exog = X,
family = sm.families.Binomial(link=sm.families.links.Logit())).fit())
#Let's see some information about our model
logit_model.summary()

相关内容

最新更新

热门标签：