我在下面运行我的代码,并不断得到PatsyError:模型缺少所需的结果变量。
当我用公式符号
运行它时,它是好的logit_model = sm.logit('y ~ age+default+balance+housing+loan+duration+campaign+pdays+previous', data = train_data).fit()
但是当我用np数组符号再次尝试时,我一直得到错误
logit_model = sm.logit(X_train, Y_train).fit()
这是我的完整代码
bank = pd.read_csv("C:/Bank.csv")
bank.isnull().sum()
#convert to binary
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'y': {'no':0,'yes':1} },inplace=True)
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'default': {'no':0,'yes':1} },inplace=True)
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'loan': {'no':0,'yes':1} },inplace=True)
bank[bank.select_dtypes(['object']).columns]=bank.select_dtypes(['object']).apply(lambda x: x.astype('category'))
bank.replace(to_replace={'housing': {'no':0,'yes':1} },inplace=True)
bankx = bank[["age","default","balance","housing","loan","duration","campaign","pdays", "previous"]]
banky = bank["y"]
X = np.array(bankx)
Y = np.array(banky)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=0.25)
logit_model = sm.logit(X_train, Y_train).fit()
首先使用示例数据集:
import seaborn as sns
data = sns.load_dataset("iris")
data['target'] = (data['species'] == "versicolor").astype("float")
data.head()
sepal_length sepal_width petal_length petal_width species target
0 5.1 3.5 1.4 0.2 setosa 0
1 4.9 3.0 1.4 0.2 setosa 0
2 4.7 3.2 1.3 0.2 setosa 0
3 4.6 3.1 1.5 0.2 setosa 0
4 5.0 3.6 1.4 0.2 setosa 0
如果你想使用公式,你调用statsmodel公式api,你就可以得到一个结果的总结:
import statsmodels.formula.api as smf
mod = smf.logit(formula='target ~ sepal_length + petal_length', data=data)
res = mod.fit()
# you get a summary of the results
res.summary()
您可以使用numpy数组或pandas数据框,并且您需要从statsmodels.api
调用Logit
,而不是上面的公式接口:
import statsmodels.api as sm
X = data[['sepal_length','petal_length']]
y = data['target']
mod = sm.Logit(y, X)
res.summary()