我不知道为什么它不调用任何集成。也许是参数搞砸了?
森林覆盖类型数据:
X=(581012,54)的形状
y=(581012)的形状
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn import model_selection
classifier_names = ["logistic regression", "linear SVM", "nearest centroids", "decision tree"]
classifiers = [LogisticRegression, LinearSVC, NearestCentroid, DecisionTreeClassifier]
ensemble1 = VotingClassifier(classifiers)
ensemble2 = BaggingClassifier(classifiers)
ensemble3 = AdaBoostClassifier(classifiers)
ensembles = [ensemble1, ensemble2, ensemble3]
seed = 7
for ensemble in ensembles:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
for classifier in classifiers:
model = ensemble(base_estimator=classifier, random_state=seed)
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())
我预计组合会为分类器运行,但第一个组合没有运行。我先将订单更改为BaggingClassifier
,但显示了同样的错误,无法调用。
对于VotingClassifier
,估计量应该是一个包含名称和模型的元组列表。请注意,您已经创建了一个模型类,然后在元组内部给出。
来自文档:
估计器:调用拟合的(字符串,估计器)元组列表VotingClassifier上的方法将适合那些原始的克隆将存储在class属性中的估计器self.estimators_.可以使用set_params将估计器设置为None。
对于其他两个系综,对于同一个基模型,只能求解一个基估计量和n个估计量。循环使用不同的分类器,就像您编写代码一样,但每次都必须重新定义集成模型。
base_estimator:对象或无,可选(默认值=无)基础估计器来拟合数据集的随机子集。如果无,则基估计量是一个决策树。
n_esticultures:int,可选(默认值=10)基数集合中的估计器。
试试这个!
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target
classifier_names = ["logistic regression","linear SVM","nearest centroids", "decision tree"]
classifiers = [LogisticRegression(), LinearSVC(), NearestCentroid(), DecisionTreeClassifier()]
ensemble1 = VotingClassifier([(n,c) for n,c in zip(classifier_names,classifiers)])
ensemble2 = BaggingClassifier(base_estimator= DecisionTreeClassifier() , n_estimators= 10)
ensemble3 = AdaBoostClassifier(base_estimator= DecisionTreeClassifier() , n_estimators= 10)
ensembles = [ensemble1,ensemble2,ensemble3]
seed = 7
for ensemble in ensembles:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
results = model_selection.cross_val_score(ensemble, X, y, cv=kfold)
print(results.mean())
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, BaggingClassifier, VotingClassifier
from sklearn import model_selection
import warnings
warnings.filterwarnings("ignore")
seed = 7
classifier_names = ["logistic regression","linear SVM","nearest centroids", "decision tree"]
classifiers = [LogisticRegression, LinearSVC, NearestCentroid, DecisionTreeClassifier]
for classifier in classifiers:
ensemble1 = RandomForestClassifier(estimator=classifier(), n_estimators= 20, random_state=seed)
ensemble2 = AdaBoostClassifier(base_estimator=classifier(),
n_estimators= 5, learning_rate=1, random_state=seed)
ensemble3 = BaggingClassifier(base_estimator=classifier(),
max_samples=0.5, n_estimators=20, random_state=seed)
ensemble4 = VotingClassifier([(n,c) for n,c in zip(classifier_namess, classifiers)], voting="soft")
ensembles = [ensemble1, ensemble2, ensemble3, ensemble4]
for ensemble in ensembles:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
results = model_selection.cross_val_score(ensemble, X[1:100], y[1:100], cv=kfold)
print("The mean accuracy of {}:".format(results.mean()))