这是一段代码:
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegressionCV
skf = StratifiedKFold(n_splits=5)
skf_1 = skf.split(titanic_dataset, surv_titanic)
ls_1 = np.logspace(-1.0, 2.0, num=500)
clf = LogisticRegressionCV(Cs=ls_1, cv = skf_1, scoring = "roc_auc", n_jobs=-1, random_state=17)
clf_model = clf.fit(x_train, y_train)
上面写着:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-130-b99a5912ff5a> in <module>
----> 1 clf_model = clf.fit(x_train, y_train)
H:Anaconda_3libsite-packagessklearnlinear_model_logistic.py in fit(self, X, y, sample_weight)
2098 # (n_classes, n_folds, n_Cs . n_l1_ratios) or
2099 # (1, n_folds, n_Cs . n_l1_ratios)
-> 2100 coefs_paths, Cs, scores, n_iter_ = zip(*fold_coefs_)
2101 self.Cs_ = Cs[0]
2102 if multi_class == 'multinomial':
ValueError: not enough values to unpack (expected 4, got 0)
训练和测试数据集之前已经准备好了,它们与其他分类器表现良好。
这样一个通用的错误消息告诉我什么都没有。这里有什么问题?
简而言之,问题是在需要直接传递StratifiedKFold(n_splits=5)
时,将skf.split(titanic_dataset, surv_titanic)
的结果传递给了LogisticRegressionCV
上的cv
参数。
下面我展示了重现您错误的代码,下面我展示两种替代方法,它们完成了我认为您正在尝试做的事情
# Some example data
data = load_breast_cancer()
X = data['data']
y = data['target']
# Set up the stratifiedKFold
skf = StratifiedKFold(n_splits=5)
# Don't do this... only here to reproduce the error
skf_indicies = skf.split(X, y)
# Some regularization
ls_1 = np.logspace(-1.0, 2.0, num=5)
# This creates your error
clf_error = LogisticRegressionCV(Cs=ls_1,
cv = skf_indicies,
scoring = "roc_auc",
n_jobs=-1,
random_state=17)
# Error created by passing result of skf.split to cv
clf_model = clf_error.fit(X, y)
# This is probably what you meant to do
clf_using_skf = LogisticRegressionCV(Cs=ls_1,
cv = skf,
scoring = "roc_auc",
n_jobs=-1,
random_state=17,
max_iter=1_000)
# This will now fit without the error
clf_model_skf = clf_using_skf.fit(X, y)
# This is the easiest method, and from the docs also does the
# same thing as StratifiedKFold
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html
clf_easiest = LogisticRegressionCV(Cs=ls_1,
cv = 5,
scoring = "roc_auc",
n_jobs=-1,
random_state=17,
max_iter=1_000)
# This will now fit without the error
clf_model_easiest = clf_easiest.fit(X, y)