我在使用gridreachcv类输出我项目中的最佳参数时遇到了麻烦。我创建了一个类,并包含了我自己的评分方法。我试图寻找我的解决方案,但可能是我知识的缺乏无法解决问题。问题是,
- 为什么我得到score=nan?
- TypeError: fit()缺少1个必需的位置参数:'y'
我已经创建了一个玩具示例来重现错误,请忽略逻辑输出。下面是代码
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.base import BaseEstimator
from sklearn.model_selection import train_test_split
class Foo(BaseEstimator):
def __init__(self, start=0, end=0):
self.start = start
self.end = end
def fit(self, X, y):
X_, X_val, y_, y_val = train_test_split(X, y, test_size=0.20, random_state=42)
for i in range(self.start, self.end):
X[i] = X[i]**2*y[i]
return X
def predict(self, X):
val = np.max(X)
return val
def accuracy(self, x):
if x > 50:
return 100
else:
return 1
#=======================================================================
X = np.array(np.random.random(200)*100)
y = np.array(np.random.randint(2,size=200))
param_grid = {'start':[0, 10, 50], 'end':[60, 80, 100]}
foo = Foo()
scoring = make_scorer(foo.accuracy, greater_is_better=False)
grid = GridSearchCV(foo, param_grid=param_grid, scoring=scoring, verbose = 3, cv=2, refit=True)
grid.fit(X, y) #fixed
print(grid.best_params_)
#=======================================================================
错误如下:
Fitting 2 folds for each of 9 candidates, totalling 18 fits
[CV 1/2] END .....................end=60, start=0;, score=nan total time= 0.0s
[CV 2/2] END .....................end=60, start=0;, score=nan total time= 0.0s
[CV 1/2] END ....................end=60, start=10;, score=nan total time= 0.0s
[CV 2/2] END ....................end=60, start=10;, score=nan total time= 0.0s
[CV 1/2] END ....................end=60, start=50;, score=nan total time= 0.0s
[CV 2/2] END ....................end=60, start=50;, score=nan total time= 0.0s
[CV 1/2] END .....................end=80, start=0;, score=nan total time= 0.0s
[CV 2/2] END .....................end=80, start=0;, score=nan total time= 0.0s
[CV 1/2] END ....................end=80, start=10;, score=nan total time= 0.0s
[CV 2/2] END ....................end=80, start=10;, score=nan total time= 0.0s
[CV 1/2] END ....................end=80, start=50;, score=nan total time= 0.0s
[CV 2/2] END ....................end=80, start=50;, score=nan total time= 0.0s
[CV 1/2] END ....................end=100, start=0;, score=nan total time= 0.0s
[CV 2/2] END ....................end=100, start=0;, score=nan total time= 0.0s
[CV 1/2] END ...................end=100, start=10;, score=nan total time= 0.0s
[CV 2/2] END ...................end=100, start=10;, score=nan total time= 0.0s
[CV 1/2] END ...................end=100, start=50;, score=nan total time= 0.0s
[CV 2/2] END ...................end=100, start=50;, score=nan total time= 0.0s
/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py:372: FitFailedWarning:
18 fits failed out of a total of 18.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
18 fits failed with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 678, in _fit_and_score
estimator.fit(X_train, **fit_params)
TypeError: fit() missing 1 required positional argument: 'y'
warnings.warn(some_fits_failed_message, FitFailedWarning)
您得到TypeError: fit() missing 1 required positional argument: 'y'
作为错误,因为您忘记在grid.fit(X)
中传递y
。解决这个问题可以解决你的分数是NaN
。
实际上,输出还告诉您以下内容:
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
最后,我建议您确保在计算X[i] = X[i]**2*y[i]
时不会意外溢出。
编辑:
在评论中你提到你现在得到错误TypeError: accuracy() takes 2 positional arguments but 3 were given.
你的准确度方法在make_scorer()
中使用。如果你看一下文档,你会看到一个带有score_func(y, y_pred, **kwargs)
签名的分数函数是预期的,也就是说,你在准确度方法中缺少了一个参数(self
不是y
)。此外,由于accuracy
首先不使用self
,我建议将其移出类,使其成为"正常"。函数。
最后,我觉得你对Python不熟悉或者没有信心。我建议你花一些时间来真正理解这些错误的含义,以及如何使用外部包,如numpy和sklearn(通过阅读文档;))。