我正在构建scikit-learn中的决策树,树缺少叶子#2。我想知道为什么?下面是我的例子:
import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_graphviz
def leaf_ordering():
X = np.genfromtxt('X.csv', delimiter=',')
Y = np.genfromtxt('Y.csv',delimiter=',')
dt = DecisionTreeClassifier(min_samples_leaf=100, random_state=99)
dt.fit(X, Y)
print(set(dt.apply(X)))
leaf_ordering()
链接到文件X链接到文件Y
输出:{1, 3, 4}
。如你所见,没有叶子#2
节点0
和2
在您的示例中都是非叶节点。在下面的示例中,您可以从导出中看到,0
、1
和4
都是内部树节点,2
、3
、5
和6
是叶子,因此所有的预测都将在这4个节点中的一个。
In [35]: X = np.random.random([100, 5])
In [36]: y = X.sum(axis=1) + np.random.random(100)
In [37]: dt = DecisionTreeRegressor(max_depth=2)
In [38]: dt.fit(X, y)
Out[38]:
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')
In [39]: dt.apply(X)
Out[39]:
array([6, 3, 3, 3, 6, 6, 3, 6, 3, 6, 2, 3, 3, 5, 3, 5, 5, 6, 3, 3, 3, 3, 3,
3, 3, 6, 6, 3, 3, 3, 3, 5, 3, 5, 3, 3, 3, 3, 2, 3, 3, 3, 6, 3, 3, 3,
3, 6, 3, 5, 2, 3, 3, 6, 3, 3, 3, 3, 3, 6, 6, 3, 6, 6, 3, 5, 6, 3, 3,
3, 3, 6, 3, 3, 2, 3, 6, 2, 6, 2, 3, 3, 6, 2, 5, 6, 3, 3, 3, 6, 5, 3,
3, 3, 6, 6, 3, 3, 6, 5])
In [40]: export_graphviz(dt)
In [41]: !cat tree.dot
digraph Tree {
node [shape=box] ;
0 [label="X[2] <= 0.7003nmse = 0.4442nsamples = 100nvalue = 3.0586"] ;
1 [label="X[4] <= 0.1842nmse = 0.3332nsamples = 65nvalue = 2.8321"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="mse = 0.0426nsamples = 7nvalue = 1.9334"] ;
1 -> 2 ;
3 [label="mse = 0.2591nsamples = 58nvalue = 2.9406"] ;
1 -> 3 ;
4 [label="X[0] <= 0.3576nmse = 0.3782nsamples = 35nvalue = 3.4791"] ;
0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
5 [label="mse = 0.1212nsamples = 10nvalue = 2.9395"] ;
4 -> 5 ;
6 [label="mse = 0.3179nsamples = 25nvalue = 3.695"] ;
4 -> 6 ;
}