Scikit-learn TypeError:如果未指定评分,则传递的估计器应具有"分数"方法



我已经使用scikit-learn在python中创建了一个自定义模型,我想使用交叉验证。

模型的类别定义如下:

class MultiLabelEnsemble:
''' MultiLabelEnsemble(predictorInstance, balance=False)
    Like OneVsRestClassifier: Wrapping class to train multiple models when 
    several objectives are given as target values. Its predictor may be an ensemble.
    This class can be used to create a one-vs-rest classifier from multiple 0/1 labels
    to treat a multi-label problem or to create a one-vs-rest classifier from
    a categorical target variable.
    Arguments:
        predictorInstance -- A predictor instance is passed as argument (be careful, you must instantiate
    the predictor class before passing the argument, i.e. end with (), 
    e.g. LogisticRegression().
        balance -- True/False. If True, attempts to re-balance classes in training data
        by including a random sample (without replacement) s.t. the largest class has at most 2 times
    the number of elements of the smallest one.
    Example Usage: mymodel =  MultiLabelEnsemble (GradientBoostingClassifier(), True)'''
def __init__(self, predictorInstance, balance=False):
    self.predictors = [predictorInstance]
    self.n_label = 1
    self.n_target = 1
    self.n_estimators =  1 # for predictors that are ensembles of estimators
    self.balance=balance
def __repr__(self):
    return "MultiLabelEnsemble"
def __str__(self):
    return "MultiLabelEnsemble : n" + "tn_label={}n".format(self.n_label) + "tn_target={}n".format(self.n_target) + "tn_estimators={}n".format(self.n_estimators) + str(self.predictors[0])
def fit(self, Xtrain, Ytrain):
    if len(Ytrain.shape)==1: 
        Ytrain = np.array([Ytrain]).transpose() # Transform vector into column matrix
        # This is NOT what we want: Y = Y.reshape( -1, 1 ), because Y.shape[1] out of range
    self.n_target = Ytrain.shape[1]                # Num target values = num col of Y
    self.n_label = len(set(Ytrain.ravel()))        # Num labels = num classes (categories of categorical var if n_target=1 or n_target if labels are binary )
    # Create the right number of copies of the predictor instance
    if len(self.predictors)!=self.n_target:
        predictorInstance = self.predictors[0]
        self.predictors = [predictorInstance]
        for i in range(1,self.n_target):
            self.predictors.append(copy.copy(predictorInstance))
    # Fit all predictors
    for i in range(self.n_target):
        # Update the number of desired prodictos
        if hasattr(self.predictors[i], 'n_estimators'):
            self.predictors[i].n_estimators=self.n_estimators
        # Subsample if desired
        if self.balance:
            pos = Ytrain[:,i]>0
            neg = Ytrain[:,i]<=0
            if sum(pos)<sum(neg): 
                chosen = pos
                not_chosen = neg
            else: 
                chosen = neg
                not_chosen = pos
            num = sum(chosen)
            idx=filter(lambda(x): x[1]==True, enumerate(not_chosen))
            idx=np.array(zip(*idx)[0])
            np.random.shuffle(idx)
            chosen[idx[0:min(num, len(idx))]]=True
            # Train with chosen samples            
            self.predictors[i].fit(Xtrain[chosen,:],Ytrain[chosen,i])
        else:
            self.predictors[i].fit(Xtrain,Ytrain[:,i])
    return
def predict_proba(self, Xtrain):
    if len(Xtrain.shape)==1: # IG modif Feb3 2015
        X = np.reshape(Xtrain,(-1,1))   
    prediction = self.predictors[0].predict_proba(Xtrain)
    if self.n_label==2:                 # Keep only 1 prediction, 1st column = (1 - 2nd column)
        prediction = prediction[:,1]
    for i in range(1,self.n_target): # More than 1 target, we assume that labels are binary
        new_prediction = self.predictors[i].predict_proba(Xtrain)[:,1]
        prediction = np.column_stack((prediction, new_prediction))
    return prediction

当我这样调用这个类进行交叉验证时:

kf = cross_validation.KFold(len(Xtrain), n_folds=10)
score = cross_val_score(self.model, Xtrain, Ytrain, cv=kf, n_jobs=-1).mean()

我得到以下错误:

TypeError:如果未指定评分,则传递的估计器应具有"score"方法。估计器MultiLabelEnsemble没有。

如何创建评分方法?

使错误消失的最简单方法是将scoring="accuracy"scoring="hamming"传递给cross_val_scorecross_val_score函数本身不知道你试图解决什么样的问题,所以它不知道什么是合适的度量。看起来你正在尝试进行多标签分类,所以也许你想使用hamming损失?

您还可以实现score方法,如"Roll your own estimater"文档中所述,该方法具有as签名def score(self, X, y_true)。看见http://scikit-learn.org/stable/developers/#different-对象

顺便说一句,你确实知道OneVsRestClassifier,对吧?它看起来有点像你正在重新实现它。

相关内容

  • 没有找到相关文章

最新更新