Python中随机森林的提取决策规则



不过我有一个问题。我从某人那里听说,在 R 中,您可以使用额外的包来提取在 RF 中实现的决策规则,我尝试在 python 中谷歌同样的事情,但没有运气,如果有任何关于如何实现的帮助。 提前感谢!

假设您使用 sklearn RandomForestClassifier,您可以找到.estimators_的不可侵犯的决策树。每棵树将决策节点存储为tree_下的多个 NumPy 数组。

下面是一些示例代码,它只是按数组的顺序打印每个节点。在典型的应用程序中,人们会跟随子项进行遍历。

import numpy
from sklearn.model_selection import train_test_split
from sklearn import metrics, datasets, ensemble
def print_decision_rules(rf):
for tree_idx, est in enumerate(rf.estimators_):
tree = est.tree_
assert tree.value.shape[1] == 1 # no support for multi-output
print('TREE: {}'.format(tree_idx))
iterator = enumerate(zip(tree.children_left, tree.children_right, tree.feature, tree.threshold, tree.value))
for node_idx, data in iterator:
left, right, feature, th, value = data
# left: index of left child (if any)
# right: index of right child (if any)
# feature: index of the feature to check
# th: the threshold to compare against
# value: values associated with classes            
# for classifier, value is 0 except the index of the class to return
class_idx = numpy.argmax(value[0])
if left == -1 and right == -1:
print('{} LEAF: return class={}'.format(node_idx, class_idx))
else:
print('{} NODE: if feature[{}] < {} then next={} else next={}'.format(node_idx, feature, th, left, right))    

digits = datasets.load_digits()
Xtrain, Xtest, ytrain, ytest = train_test_split(digits.data, digits.target)
estimator = ensemble.RandomForestClassifier(n_estimators=3, max_depth=2)
estimator.fit(Xtrain, ytrain)
print_decision_rules(estimator)

示例输出:

TREE: 0
0 NODE: if feature[33] < 2.5 then next=1 else next=4
1 NODE: if feature[38] < 0.5 then next=2 else next=3
2 LEAF: return class=2
3 LEAF: return class=9
4 NODE: if feature[50] < 8.5 then next=5 else next=6
5 LEAF: return class=4
6 LEAF: return class=0
...

我们在emlearn中使用类似的东西来编译随机森林到C代码。

最新更新