提取规则以预测决策树中的儿童节点或概率分数



我对决策树的实现相对较新。我正在尝试提取仅预测子节点的规则,并且我需要它能够预测新数据的概率分数(不仅是最终分类),并可能将算法传输到其他用户。有一种简单的方法吗?我找到了一些解决方案(如何从Scikit-Learn决策-tre中提取决策规则?)。但是,当我测试它们时,由于某种原因,我并没有获得所有孩子的节点(我的树非常大)。任何建议将不胜感激。谢谢。

我已经更新了上面链接中的第一个代码以产生节点,并且它似乎与大树最有效。但是,我很难使其与PD DataFrames一起使用。这是示例:导入大熊猫作为pd导入numpy作为NP来自Sklearn.Tree Import DecisionTreeClalerifier

虚拟数据:

df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})
df
# create decision tree
dt = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_leaf=1)
dt.fit(df.loc[:,('col1','col2')], df.dv)
from sklearn.tree import _tree
def tree_to_code(tree, feature_names):
    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]
    print ("def tree({}):".format(", ".join(feature_names)))
    def recurse(node, depth):
        indent = "  " * depth
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            print ("{}if {} <= {}:".format(indent, name, threshold))
            recurse(tree_.children_left[node], depth + 1)
            print ("{}else:  # if {} > {}".format(indent, name, threshold))
            recurse(tree_.children_right[node], depth + 1)
        else:
            print ("{}return {}".format(indent, node))
    recurse(0, 1)
tree_to_code(dt, df.columns)

上面的呼叫产生以下代码:

def tree(col1, col2, dv):
  if col2 <= 3.5:
    return 1
  else:  # if col2 > 3.5
    if col1 <= 1.5:
      return 3
    else:  # if col1 > 1.5
      if col1 <= 2.5:
        return 5
      else:  # if col1 > 2.5
        return 6

,当我在下面的代码上调用上面的代码时,我会发现我缺少一个参数的错误。如何修改代码以使其在PANDAS DataFrame上工作?

tree('col1', 'col2', 'dv_pred')

这是一个工作解决方案

import pandas as pd
from sklearn.tree import _tree
from sklearn.tree import DecisionTreeClassifier
df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})
# create decision tree
dt = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_leaf=1)
features = ['col1','col2']
dt.fit(df.loc[:,features], df.dv)

def tree_to_code(tree, feature_names):
    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]
    print ("def tree(x):")
    def recurse(node, depth):
        indent = "  " * depth
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            print ("{}if x['{}'] <= {}:".format(indent, name, threshold))
            recurse(tree_.children_left[node], depth + 1)
            print ("{}else:  # if x['{}'] > {}".format(indent, name, threshold))
            recurse(tree_.children_right[node], depth + 1)
        else:
            print ("{}return {}".format(indent, node))
    recurse(0, 1)
tree_to_code(dt,  df[features].columns)

然后获得预测

df.apply(tree, axis=1)

最新更新