我可以看到答案是如何scikit-learn GridSearchCV bestrongcore_计算的?对于这个分数意味着什么。
我正在使用scikit学习决策树的例子,并尝试各种值的评分参数。
if __name__ == '__main__':
df = pd.read_csv('/Users/tcssig/Downloads/ad-dataset/ad.data', header=None)
explanatory_variable_columns = set(df.columns.values)
response_variable_column = df[len(df.columns.values)-1]
# The last column describes the targets
explanatory_variable_columns.remove(len(df.columns.values)-1)
y = [1 if e == 'ad.' else 0 for e in response_variable_column]
X = df[list(explanatory_variable_columns)]
X.replace(to_replace=' *?', value=-1, regex=True, inplace=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
pipeline = Pipeline([('clf', DecisionTreeClassifier(criterion='entropy'))])
parameters = {'clf__max_depth': (150, 155, 160), 'clf__min_samples_split': (1, 2, 3), 'clf__min_samples_leaf': (1, 2, 3)}
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1,verbose=1, scoring='accuracy')
grid_search.fit(X_train, y_train)
print ('Best score: %0.3f' % grid_search.best_score_)
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
print ('t%s: %r' % (param_name, best_parameters[param_name]))
predictions = grid_search.predict(X_test)
print (classification_report(y_test, predictions))
每次我得到best_score_
的diff值,范围从0.92
到0.96
。
这个分数是否决定了我最终应该使用的Scoring参数值。同样在scikit学习网站上,我看到准确率值不应该在分类不平衡的情况下使用。
bestrongcore_值每次都不同,因为您没有在decisiontreecclassifier中传递random_state的固定值。您可以执行以下操作,以便每次在任何机器上运行代码时都获得相同的值。
random_seed = 77 ##It can be any value of your choice
pipeline = Pipeline([('clf', DecisionTreeClassifier(criterion='entropy', random_state = random_seed))])