scikit learn cross_validation需要更多关于结果分数的信息

我正试图为我正在进行的项目生成一些"哪个引擎最有效"的数据。我的总体想法是做一些非常简单的事情，选择一个引擎，进行交叉验证，生成所有交叉验证结果的列表，其中最大的是"最佳"。所有测试都是在同一组教学数据上进行的。这是我想法的一个片段。然后，我会将其放入一个循环中，而不是将simple_clf设置为svm。SVC（）有一个引擎循环，并为每个引擎执行其余的代码。基本数据以featurevecs为单位，scorenums包含特定基本数据项应该生成的相应分值0到9。

    X_train, X_test, y_train, y_test = train_test_split(
        featurevecs, scorenums, test_size = 0.333, random_state = 0 )
    # this would be in a loop of engine types but I'm just making sure basic code works
    simple_clf = svm.SVC()
    simple_clf = grid_search.GridSearchCV( simple_clf, CLFPARAMS, cv = 3 )
    simple_clf.fit( X_train, y_train )
    kf = cross_validation.KFold( len( X_train ), k = 5 )
    scores = cross_validation.cross_val_score( simple_clf, X_test, 
                                               y_test, cv = kf )
    print scores.mean(), scores.std() / 2
    # loop would end here

我的问题是，在说什么是"最好的"方面，分数不适用于我应该提供的内容。分数可以提供.mmean（）和.std（）供我打印。但我不希望引擎只返回完全匹配的结果，也希望返回"接近"的匹配。在我的情况下，接近意味着数字分数在预期分数的1以内。也就是说，如果预期得分是3分，那么2分、3分或4分将被视为一场比赛，并取得好成绩。

我查看了文档，最新的scikit-learn前沿版本似乎在metrics包中添加了一个允许将自定义评分函数传递到网格搜索的功能，但我不确定这是否足以满足我的需求。因为我还需要能够将其传递给cross_val_score函数，而不仅仅是grid_search，不是吗？无论如何，这不是一种选择，我被锁定在我必须使用的scikit的哪个版本。

我还注意到在最新的出血边缘版本中引用了cross_val_prdict，这似乎正是我所需要的，但我再次被锁定在我使用的版本中。

当交叉验证的"好"定义与它使用的默认定义不完全匹配时，在出血边缘之前做了什么？肯定是做了些什么。我只需要找到正确的方向。

由于公司的IT政策，我被困在scikit learn的0.11版本，只能使用批准的软件，而不久前批准的版本是我唯一的选择。

以下是我更改的内容，使用有用的提示查看0.11文档中的cross_val_score，发现它可以获得自定义的score函数，并且只要它匹配参数，我就可以编写自己的函数。这就是我现在拥有的代码。这会达到我想要的效果吗，即生成的结果不仅基于精确匹配，还基于"close"时的结果，其中close被定义为在1内。

# KLUDGE way of changing testing from match to close
SCORE_COUNT = 0
SCORE_CROSSOVER_COUNT = 0
def my_custom_score_function( y_true, y_pred ):
    # KLUDGE way of changing testing from match to close
    global SCORE_COUNT, SCORE_CROSSOVER_COUNT
    if( SCORE_COUNT < SCORE_CROSSOVER_COUNT ):
        close_applies = False
    else:
        close_applies = True
    SCORE_COUNT += 1
    print( close_applies, SCORE_CROSSOVER_COUNT, SCORE_COUNT )
    deltas = np.abs( y_true - y_pred )
    good = 0
    for delta in deltas:
        if( delta == 0 ):
            good += 1
        elif( close_applies and ( delta == 1 ) ):
            good += 1
    answer = float( good ) / float( len( y_true ) )
    return answer

主例程的代码片段：

        fold_count = 5
        # KLUDGE way of changing testing from match to close
        # set global variables for custom scorer function
        global SCORE_COUNT, SCORE_CROSSOVER_COUNT
        SCORE_COUNT = 0
        SCORE_CROSSOVER_COUNT = fold_count
        # do a simple cross validation
        simple_clf = svm.SVC()
        simple_clf = grid_search.GridSearchCV( simple_clf, CLFPARAMS, cv = 3 )
        simple_clf.fit( X_train, y_train )
        print( '{0} '.format( test_type ), end = "" )
        kf = cross_validation.KFold( len( X_train ), k = fold_count )
        scores = cross_validation.cross_val_score( simple_clf, X_train, y_train,
                                                   cv = kf,
                                                   score_func = my_custom_score_function )
        print( 'Accuracy (+/- 0) {1:0.4f} (+/- {2:0.4f}) '.format( scores, scores.mean(),
                                                                   scores.std() / 2 ), 
                                                                   end = "" )
        scores = cross_validation.cross_val_score( simple_clf, X_train, y_train,
                                                   cv = kf,
                                                   score_func = my_custom_score_function )
        print( 'Accuracy (+/- 1) {1:0.4f} (+/- {2:0.4f}) '.format( scores, scores.mean(),
                                                                   scores.std() / 2 ), 
                                                                   end = "" )
         print( "" )

您可以在此处找到cross_val_score和0.11的文档您可以提供一个自定义存储函数作为score_func参数，接口不同。除此之外：为什么你会被"锁定"在当前版本中？它们通常向后兼容两个版本。

相关内容

最新更新

热门标签：