为什么cross_val_score不能产生一致的结果?



当这段代码执行时,结果不一致。随机性从何而来?

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
seed = 42
iris = datasets.load_iris()
X = iris.data
y = iris.target
pipeline = Pipeline([('std', StandardScaler()), 
('pca', PCA(n_components = 4)), 
('Decision_tree', DecisionTreeClassifier())], 
verbose = False)
kfold = KFold(n_splits = 10, random_state = seed, shuffle = True)
results = cross_val_score(pipeline, X, y, cv = kfold)
print(results.mean())

0.9466666666666667
0.9266666666666665
0.9466666666666667
0.9400000000000001
0.9266666666666665

DecisionTreeClassifier不使用所有列,而是默认使用每个分割的列数的平方根。您将种子分配给KFold,但没有分配给DecisionTreeClassifier。因此,每次运行将选择不同的列。PCA也接受随机状态

参见decisiontreecclassifier和PCA

相关内容

  • 没有找到相关文章

最新更新