GridsearchCV给出的分数为nan



我在使用gridreachcv类输出我项目中的最佳参数时遇到了麻烦。我创建了一个类,并包含了我自己的评分方法。我试图寻找我的解决方案,但可能是我知识的缺乏无法解决问题。问题是,

  • 为什么我得到score=nan?
  • TypeError: fit()缺少1个必需的位置参数:'y'

我已经创建了一个玩具示例来重现错误,请忽略逻辑输出。下面是代码

import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from  sklearn.base import BaseEstimator
from sklearn.model_selection import train_test_split
class Foo(BaseEstimator):
def __init__(self, start=0, end=0):
self.start = start
self.end = end
def fit(self, X, y):
X_, X_val, y_, y_val = train_test_split(X, y, test_size=0.20, random_state=42) 
for i in range(self.start, self.end):
X[i] = X[i]**2*y[i]
return X
def predict(self, X):
val = np.max(X)
return val
def accuracy(self, x):
if x > 50:
return 100
else:
return 1
#=======================================================================
X = np.array(np.random.random(200)*100)
y = np.array(np.random.randint(2,size=200))
param_grid = {'start':[0, 10, 50], 'end':[60, 80, 100]}
foo = Foo()
scoring = make_scorer(foo.accuracy, greater_is_better=False)
grid = GridSearchCV(foo, param_grid=param_grid, scoring=scoring, verbose = 3, cv=2, refit=True) 
grid.fit(X, y) #fixed
print(grid.best_params_) 
#=======================================================================

错误如下:

Fitting 2 folds for each of 9 candidates, totalling 18 fits
[CV 1/2] END .....................end=60, start=0;, score=nan total time=   0.0s
[CV 2/2] END .....................end=60, start=0;, score=nan total time=   0.0s
[CV 1/2] END ....................end=60, start=10;, score=nan total time=   0.0s
[CV 2/2] END ....................end=60, start=10;, score=nan total time=   0.0s
[CV 1/2] END ....................end=60, start=50;, score=nan total time=   0.0s
[CV 2/2] END ....................end=60, start=50;, score=nan total time=   0.0s
[CV 1/2] END .....................end=80, start=0;, score=nan total time=   0.0s
[CV 2/2] END .....................end=80, start=0;, score=nan total time=   0.0s
[CV 1/2] END ....................end=80, start=10;, score=nan total time=   0.0s
[CV 2/2] END ....................end=80, start=10;, score=nan total time=   0.0s
[CV 1/2] END ....................end=80, start=50;, score=nan total time=   0.0s
[CV 2/2] END ....................end=80, start=50;, score=nan total time=   0.0s
[CV 1/2] END ....................end=100, start=0;, score=nan total time=   0.0s
[CV 2/2] END ....................end=100, start=0;, score=nan total time=   0.0s
[CV 1/2] END ...................end=100, start=10;, score=nan total time=   0.0s
[CV 2/2] END ...................end=100, start=10;, score=nan total time=   0.0s
[CV 1/2] END ...................end=100, start=50;, score=nan total time=   0.0s
[CV 2/2] END ...................end=100, start=50;, score=nan total time=   0.0s
/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py:372: FitFailedWarning: 
18 fits failed out of a total of 18.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
18 fits failed with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 678, in _fit_and_score
estimator.fit(X_train, **fit_params)
TypeError: fit() missing 1 required positional argument: 'y'
warnings.warn(some_fits_failed_message, FitFailedWarning)

您得到TypeError: fit() missing 1 required positional argument: 'y'作为错误,因为您忘记在grid.fit(X)中传递y。解决这个问题可以解决你的分数是NaN

实际上,输出还告诉您以下内容:

The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

最后,我建议您确保在计算X[i] = X[i]**2*y[i]时不会意外溢出。

编辑:
在评论中你提到你现在得到错误TypeError: accuracy() takes 2 positional arguments but 3 were given.你的准确度方法在make_scorer()中使用。如果你看一下文档,你会看到一个带有score_func(y, y_pred, **kwargs)签名的分数函数是预期的,也就是说,你在准确度方法中缺少了一个参数(self不是y)。此外,由于accuracy首先不使用self,我建议将其移出类,使其成为"正常"。函数。

最后,我觉得你对Python不熟悉或者没有信心。我建议你花一些时间来真正理解这些错误的含义,以及如何使用外部包,如numpy和sklearn(通过阅读文档;))。

最新更新