Python逻辑回归(PatsyError:模型缺少所需的结果变量)



我在下面运行我的代码,并不断得到PatsyError:模型缺少所需的结果变量。

当我用公式符号

运行它时,它是好的
logit_model = sm.logit('y ~ age+default+balance+housing+loan+duration+campaign+pdays+previous', data = train_data).fit()

但是当我用np数组符号再次尝试时,我一直得到错误

logit_model = sm.logit(X_train, Y_train).fit()

这是我的完整代码

bank = pd.read_csv("C:/Bank.csv")
bank.isnull().sum()
#convert to binary
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'y': {'no':0,'yes':1} },inplace=True)
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'default': {'no':0,'yes':1} },inplace=True)
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'loan': {'no':0,'yes':1} },inplace=True)
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'housing': {'no':0,'yes':1} },inplace=True)

bankx = bank[["age","default","balance","housing","loan","duration","campaign","pdays", "previous"]]
banky = bank["y"]
X = np.array(bankx)
Y = np.array(banky)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=0.25)

logit_model = sm.logit(X_train, Y_train).fit()

首先使用示例数据集:

import seaborn as sns
data = sns.load_dataset("iris")
data['target'] = (data['species'] == "versicolor").astype("float")
data.head()
sepal_length  sepal_width  petal_length  petal_width species  target
0           5.1          3.5           1.4          0.2  setosa       0
1           4.9          3.0           1.4          0.2  setosa       0
2           4.7          3.2           1.3          0.2  setosa       0
3           4.6          3.1           1.5          0.2  setosa       0
4           5.0          3.6           1.4          0.2  setosa       0

如果你想使用公式,你调用statsmodel公式api,你就可以得到一个结果的总结:

import statsmodels.formula.api as smf
mod = smf.logit(formula='target ~ sepal_length + petal_length', data=data)
res = mod.fit()
# you get a summary of the results
res.summary()

您可以使用numpy数组或pandas数据框,并且您需要从statsmodels.api调用Logit,而不是上面的公式接口:

import statsmodels.api as sm
X = data[['sepal_length','petal_length']]
y = data['target']
mod = sm.Logit(y, X)
res.summary()

相关内容

  • 没有找到相关文章