我已经使用scikit-learn在python中创建了一个自定义模型,我想使用交叉验证。
模型的类别定义如下:
class MultiLabelEnsemble:
''' MultiLabelEnsemble(predictorInstance, balance=False)
Like OneVsRestClassifier: Wrapping class to train multiple models when
several objectives are given as target values. Its predictor may be an ensemble.
This class can be used to create a one-vs-rest classifier from multiple 0/1 labels
to treat a multi-label problem or to create a one-vs-rest classifier from
a categorical target variable.
Arguments:
predictorInstance -- A predictor instance is passed as argument (be careful, you must instantiate
the predictor class before passing the argument, i.e. end with (),
e.g. LogisticRegression().
balance -- True/False. If True, attempts to re-balance classes in training data
by including a random sample (without replacement) s.t. the largest class has at most 2 times
the number of elements of the smallest one.
Example Usage: mymodel = MultiLabelEnsemble (GradientBoostingClassifier(), True)'''
def __init__(self, predictorInstance, balance=False):
self.predictors = [predictorInstance]
self.n_label = 1
self.n_target = 1
self.n_estimators = 1 # for predictors that are ensembles of estimators
self.balance=balance
def __repr__(self):
return "MultiLabelEnsemble"
def __str__(self):
return "MultiLabelEnsemble : n" + "tn_label={}n".format(self.n_label) + "tn_target={}n".format(self.n_target) + "tn_estimators={}n".format(self.n_estimators) + str(self.predictors[0])
def fit(self, Xtrain, Ytrain):
if len(Ytrain.shape)==1:
Ytrain = np.array([Ytrain]).transpose() # Transform vector into column matrix
# This is NOT what we want: Y = Y.reshape( -1, 1 ), because Y.shape[1] out of range
self.n_target = Ytrain.shape[1] # Num target values = num col of Y
self.n_label = len(set(Ytrain.ravel())) # Num labels = num classes (categories of categorical var if n_target=1 or n_target if labels are binary )
# Create the right number of copies of the predictor instance
if len(self.predictors)!=self.n_target:
predictorInstance = self.predictors[0]
self.predictors = [predictorInstance]
for i in range(1,self.n_target):
self.predictors.append(copy.copy(predictorInstance))
# Fit all predictors
for i in range(self.n_target):
# Update the number of desired prodictos
if hasattr(self.predictors[i], 'n_estimators'):
self.predictors[i].n_estimators=self.n_estimators
# Subsample if desired
if self.balance:
pos = Ytrain[:,i]>0
neg = Ytrain[:,i]<=0
if sum(pos)<sum(neg):
chosen = pos
not_chosen = neg
else:
chosen = neg
not_chosen = pos
num = sum(chosen)
idx=filter(lambda(x): x[1]==True, enumerate(not_chosen))
idx=np.array(zip(*idx)[0])
np.random.shuffle(idx)
chosen[idx[0:min(num, len(idx))]]=True
# Train with chosen samples
self.predictors[i].fit(Xtrain[chosen,:],Ytrain[chosen,i])
else:
self.predictors[i].fit(Xtrain,Ytrain[:,i])
return
def predict_proba(self, Xtrain):
if len(Xtrain.shape)==1: # IG modif Feb3 2015
X = np.reshape(Xtrain,(-1,1))
prediction = self.predictors[0].predict_proba(Xtrain)
if self.n_label==2: # Keep only 1 prediction, 1st column = (1 - 2nd column)
prediction = prediction[:,1]
for i in range(1,self.n_target): # More than 1 target, we assume that labels are binary
new_prediction = self.predictors[i].predict_proba(Xtrain)[:,1]
prediction = np.column_stack((prediction, new_prediction))
return prediction
当我这样调用这个类进行交叉验证时:
kf = cross_validation.KFold(len(Xtrain), n_folds=10)
score = cross_val_score(self.model, Xtrain, Ytrain, cv=kf, n_jobs=-1).mean()
我得到以下错误:
TypeError:如果未指定评分,则传递的估计器应具有"score"方法。估计器MultiLabelEnsemble没有。
如何创建评分方法?
scoring="accuracy"
或scoring="hamming"
传递给cross_val_score
。cross_val_score
函数本身不知道你试图解决什么样的问题,所以它不知道什么是合适的度量。看起来你正在尝试进行多标签分类,所以也许你想使用hamming损失?
您还可以实现score
方法,如"Roll your own estimater"文档中所述,该方法具有as签名def score(self, X, y_true)
。看见http://scikit-learn.org/stable/developers/#different-对象
顺便说一句,你确实知道OneVsRestClassifier
,对吧?它看起来有点像你正在重新实现它。