"Too many indices for array" Sklearn 中make_scorer函数中的错误



目标:使用较短的分数损失来使用GridSearchCV 训练随机森林算法

问题:当使用make_scorer时,目标"y"的概率预测是错误的维度。

在看了这个问题之后,我使用它建议的代理函数来使用GridSearchCV,该函数经过了brier分数损失训练。下面是一个设置示例:

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import brier_score_loss,make_scorer
from sklearn.ensemble import RandomForestClassifier
import numpy as np
def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
return proxied_func(y_true, y_probs[:, class_idx], **kwargs)
brier_scorer = make_scorer(ProbaScoreProxy, greater_is_better=False, 
needs_proba=True, class_idx=1, proxied_func=brier_score_loss)
X = np.random.randn(100,2)
y = (X[:,0]>0).astype(int)
random_forest = RandomForestClassifier(n_estimators=10)
random_forest.fit(X,y)
probs = random_forest.predict_proba(X)

现在将probsy直接传递给brier_score_lossProbaScoreProxy不会导致错误:

ProbaScoreProxy(y,probs,1,brier_score_loss)

输出:

0.0006

现在通过brier_scorer:

brier_scorer(random_forest,X,y)

输出:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-28-1474bb08e572> in <module>()
----> 1 brier_scorer(random_forest,X,y)
~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_scorer.py in __call__(self, estimator, X, y_true, sample_weight)
167                           stacklevel=2)
168         return self._score(partial(_cached_call, None), estimator, X, y_true,
--> 169                            sample_weight=sample_weight)
170 
171     def _factory_args(self):
~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_scorer.py in _score(self, method_caller, clf, X, y, sample_weight)
258                                                  **self._kwargs)
259         else:
--> 260             return self._sign * self._score_func(y, y_pred, **self._kwargs)
261 
262     def _factory_args(self):
<ipython-input-25-5321477444e1> in ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs)
5 
6 def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
----> 7     return proxied_func(y_true, y_probs[:, class_idx], **kwargs)
8 
9 brier_scorer = make_scorer(ProbaScoreProxy, greater_is_better=False,                            needs_proba=True, class_idx=1, proxied_func=brier_score_loss)
IndexError: too many indices for array

所以make_scorer中似乎发生了一些事情来改变其概率输入的维度,但我似乎看不出问题是什么

版本:-sklearn:'0.22.2.post1'-numpy:'1.18.1'

注意,这里y是正确的维度(1-d(,你会发现,正是传递给ProbaScoreProxyy_probs的维度导致了这个问题。

这只是最后一个问题中写得不好的代码吗拥有一个像GridSearchCV这样的make_score对象来训练RF的最终方法是什么

目标:使用较短的分数损失来使用GridSearchCV 训练随机森林算法

为此目标,可以直接使用GridSearchCVscoring参数中的字符串值'neg_brier_score'

例如:

gc = GridSearchCV(random_forest,
param_grid={"n_estimators":[5, 10]},
scoring="neg_brier_score")
gc.fit(X, y)
print(gc.scorer_) 
# make_scorer(brier_score_loss, greater_is_better=False, needs_proba=True)

最新更新