KeyError在执行朴素贝叶斯和决策树分类时



我想使用朴素贝叶斯和决策树对虹膜数据集进行分类。我得到了一个我不理解也无法解决的keyerror

from sklearn import datasets, naive_bayes, tree, metrics
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import warnings
import random
# Get raw data and labels from the iris dataset
labelled_data = list(zip(iris_df, iris.target))
def sample_data(training_frac=0.5, iris_data=iris_df, iris_labels=iris.target):
# separate data into training and testing sets
training_size = int(training_frac * len(iris_data))

training_idx = random.sample(range(0, len(iris_data)), k=training_size)
testing_idx = [idx for idx in range(0, len(iris_data)) if idx not in training_idx]

assert(len(training_idx) + len(testing_idx) == len(iris_data))

training_set = [iris_data[idx] for idx in training_idx]
training_labels = [iris_labels[idx] for idx in training_idx]
testing_set = [iris_data[idx] for idx in testing_idx]
testing_labels = [iris_labels[idx] for idx in testing_idx]

return (training_set, training_labels), (testing_set, testing_labels)
# run the designated classifier
def run_classifier(classifier, training, testing):
classifier.fit(*training)
expect = testing[1]
predict = classifier.predict(testing[0])

return expect, predict
# collect data on training size plateau
def simulate():
# progress through range of testing data sizes
nb_acc = []
tree_acc = []
training_fracs = [x/1000 for x in range(500, 850, 25)]
for i in training_fracs:
nb = naive_bayes.CategoricalNB()
dt = tree.DecisionTreeClassifier()
training, testing = sample_data(i)

nb_expect, nb_predict = run_classifier(nb, training, testing)
dt_expect, dt_predict = run_classifier(dt, training, testing)

nb_acc.append(metrics.accuracy_score(nb_expect, nb_predict))
tree_acc.append(metrics.accuracy_score(dt_expect, dt_predict))

return nb_acc, tree_acc, training_fracs

nb_acc, tree_acc, fracs = simulate()

print(f"Naive Bayes accuracy @ 50% training: {nb_acc[0]}")
print(f"Decision Tree accuracy @ 50% training: {tree_acc[0]}")

--------------------------------------------------------------------------- KeyError 回溯(最近调用最后)~anaconda3libsite-packagespandascoreindexesbase.py inGet_loc (self, key, method, tolerance)→3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err:

pandas_libs 指数。在pandas._lib .index. indexengine .get_loc()

pandas_libs 指数。在pandas._lib .index. indexengine .get_loc()

pandas_libs hashtable_class_helper。pxi在pandas._libs.hashtable.PyObjectHashTable.get_item ()

pandas_libs hashtable_class_helper。pxi在pandas._libs.hashtable.PyObjectHashTable.get_item ()

KeyError: 18

上述异常是导致以下异常的直接原因:

KeyError回溯(最近的调用)最后一个)28返回nb_acc, tree_acc, training_frs29——比;30 nb_acc, tree_acc, fracs = simulation ()3132 print(f"朴素贝叶斯准确率@ 50%训练:{nb_acc[0]}")

in simulation ()18 nb = naive_bayes.CategoricalNB()19 dt = tree. decisiontreecclassifier ()——比;20 .训练、测试= sample_data(i)2122 nb_expect, nb_predict = run_classifier(nb, training, testing)

in sample_data(training_frac,iris_data iris_labels)11 assert(len(training_idx) + len(testing_idx) == len(iris_data))12——比;13 training_set = [iris_data[idx] for idx in training_idx]14 training_labels = [iris_labels[idx] for idx in training_idx]15

in (.0)11 assert(len(training_idx) + len(testing_idx) == len(iris_data))12——比;13 training_set = [iris_data[idx] for idx in training_idx]14 training_labels = [iris_labels[idx] for idx in training_idx]15

~熊猫 anaconda3 lib 网站 frame.py核心getitem(self, key) 3022 if self.columns.nlevels>1: 3023返回self._getitem_multilevel(key)→3024 index = self.columns.get_loc(key) 3025 if is_integer(indexer): 3026 indexer = [indexer]

~熊猫 anaconda3 lib 网站 索引 base.py核心Get_loc (self, key, method, tolerance) 3080返回self._engine.get_loc(casted_key) 3081除了KeyError错:→3082抛出KeyError(key) from err 3083 3084如果公差不是

KeyError: 18

您在sample_data中的断言失败,因此您得到KeyError
对于列车测试分裂只需使用:

from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)

最新更新