键错误: 在轴中找不到"['class']"



我使用excel的Pyxll加载项找到了一个有关决策树算法的教程,并尝试执行。我有一个错误:keyError:" ['class']"在轴上找不到。

from pyxll import xl_func
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import os
@xl_func("float, int, int: object")
def ml_get_zoo_tree_2(train_size=0.75, max_depth=5, random_state=245245):
    # Load the zoo data
    dataset = pd.read_csv(os.path.join(os.path.dirname(__file__), "zoo.csv"))
    # Drop the animal names since this is not a good feature to split the data on
    dataset = dataset.drop("animal_name", axis=1)
    # Split the data into a training and a testing set
    features = dataset.drop("class", axis=1)
    targets = dataset["class"]
    train_features, test_features, train_targets, test_targets = 
        train_test_split(features, targets, train_size=train_size, random_state=random_state)
    # Train the model
    tree = DecisionTreeClassifier(criterion="entropy", max_depth=max_depth)
    tree = tree.fit(train_features, train_targets)
    # Add the feature names to the tree for use in predict function
    tree._feature_names = features.columns
    return tree

如果我删除了第17行和第18行以获取类代码,则我将获得错误的名称:名称"功能"未定义,那么当我删除功能时,我会得到错误,因为必须定义目标。

您需要使用该教程的正确数据集。您可以从此处下载它(和代码(https://github.com/pyxll/pyxll-examples/tree/master/master/machine-learning。

要解决错误,我首先执行了命令print dataset.columns来比较列名。

import pandas as pd
df = pd.read_csv('your_dataset.csv')
print(df.columns)

检查后,我发现了class列名称之后的一个空间,该空间在检查数据集时不可见。删除数据集中class列旁边的空间纠正了错误。

相关内容

最新更新