sklearn 交叉验证中的自定义评分函数

我想为cross_validate使用自定义函数，该函数使用特定的y_test来计算精度，这与实际的目标y_test不同y_test。

我已经尝试了几种方法make_scorer但我不知道如何实际通过我的替代y_test：

scoring = {'prec1': 'precision',
     'custom_prec1': make_scorer(precision_score()}
scores = cross_validate(pipeline, X, y, cv=5,scoring= scoring)

有人可以提出一种方法吗？

以这种方式找到。也许代码不是最佳的，对此感到抱歉。

好的，让我们开始：

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
xTrain = np.random.rand(100, 100)
yTrain = np.random.randint(1, 4, (100, 1))
yTrainCV = np.random.randint(1, 4, (100, 1))
model = LogisticRegression()

yTrainCV将在此处用作自定义记分器。

def customLoss(xArray, yArray):
    indices = xArray.index.values
    tempArray = [1 if value1 != value2 else 0 for value1, value2 in zip(xArray.values, yTrainCV[[indices]])]
    
    return sum(tempArray)
scorer = {'main': 'accuracy',
          'custom': make_scorer(customLoss, greater_is_better=True)}

这里有几个技巧：

您需要传递给 customLoss 2 值（来自模型的预测 + 实际值;但我们不使用第二个参数）
有一些游戏与greater_is_better：True/False将返回正数或负数
我们从GridSearchCV的简历中获得的指数

和。。。

grid = GridSearchCV(model,
                    scoring=scorer,
                    cv=5,
                    param_grid={'C': [1e0, 1e1, 1e2, 1e3],
                                'class_weight': ['balanced', None]},
                    refit='custom')
    
 grid.fit(xTrain, pd.DataFrame(yTrain))
 print(grid.score(xTrain, pd.DataFrame(yTrain)))

不要忘记GridSearchCV中的refit参数
我们在此处DataFrame传递目标数组 - 它将帮助我们检测自定义损失函数中的索引

相关内容

最新更新

热门标签：