根据数据集大小为 adaboostclassification 器选择n_estimators

我正在尝试使用以下代码在具有大约 300 条记录和 100 个特征的数据集上训练和预测模型。我想知道我在代码下面搜索的n_estimators的选择是否太高了？由于我只有 300 条记录，因此尝试 [10， 20， 30] 之类的东西n_estimators更有意义吗？ n_estimators是否与训练数据的数据集大小有关？学习率如何？

法典：

from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score, make_scorer
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier

# TODO: Initialize the classifier
clf = AdaBoostClassifier(random_state=0)
# TODO: Create the parameters list you wish to tune
parameters = {'n_estimators':[100,200,300],'learning_rate':[1.0,2.0,4.0]}
# TODO: Make an fbeta_score scoring object
scorer = make_scorer(accuracy_score)
# TODO: Perform grid search on the classifier using 'scorer' as the scoring method
grid_obj = GridSearchCV(clf,parameters,scoring=scorer)
# TODO: Fit the grid search object to the training data and find the optimal parameters
grid_fit = grid_obj.fit(X_train,y_train)
# Get the estimator
best_clf = grid_fit.best_estimator_
# Make predictions using the unoptimized and model
predictions = (clf.fit(X_train, y_train)).predict(X_test)
best_predictions = best_clf.predict(X_test)

让我们一次拿一个：

n_estimators：我认为根据n_estimators的定义，你的估算器越多，就会有更多的树被建造并用于投票。所以，是的，你通过最大化估计器来做对了。
learning_rate：根据定义，学习率决定了输出中每棵树的影响，参数控制影响的大小。除此之外，您应该从非常低的learning_rate开始，可能是 0.001 或 0.01，这将使您的模型更加健壮，因此您将能够控制开发/测试集中的方差。

希望这对:)有所帮助

相关内容

最新更新

热门标签：