sklearn.datasets.make_classification的异常行为

我在使用sklearn.datasets.make_classification时生成了一个异常错误，如下所示：

从位于此处的代码"plot_classifier_comparison.py"开始http://scikit-learn.org/stable/auto_examples/plot_classifier_comparison.html，我更改了以下语句（运行良好）

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                       random_state=1, n_clusters_per_class=1)

除此之外（即，只增加一个功能）：

X, y = make_classification(n_features=3, n_redundant=0, n_informative=2,
                       random_state=1, n_clusters_per_class=1)

并接收以下错误追溯（其中路径名当然是我的机器本地的）：

Traceback (most recent call last):
  File "F:/Python Packages/ChartyPy3/plot_classifier_comparison.py", line 94, in <module>
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
  File "F:Anacondalibsite-packagessklearnneighborsclassification.py", line 190, in predict_proba
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "F:Anacondalibsite-packagessklearnneighborsbase.py", line 311, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1298, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearnneighborskd_tree.c:10427)
ValueError: query data dimension must match training data dimension

现在，我已经确定前两个数据集（即"make_mons"one_answers"make_circles"）在所有分类器中运行良好。但第三个数据集（即"lineary_separable"）没有：将"KNeighborsClassifier（3）"应用于第三数据集会从调用sklearn.neighbors.kd_tree.BinaryTree.query生成错误回溯。我还尝试使用make_classification的所有默认值，即

X, y = make_classification(n_samples=100, n_features=20, n_informative=2,
                       n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2,
                       weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0,
                       scale=1.0, shuffle=True, random_state=None)

但这也产生了相同的错误回溯，具有相同的错误消息，即"ValueError:查询数据维度必须与训练数据维度匹配"

我不明白为什么更改特征的总数，或者只使用默认值作为"make_classification"的输入，会产生这个错误。我使用的是Python2.4.1（64位实现）和开发人员的64位版本scikit-learn。如有任何关于此错误和/或如何解决此错误的指导，我们将不胜感激。

该示例将每个分类器应用于二维点网格，以绘制其决策函数。将在三维输入（三个特征）上训练的分类器应用于二维输入是行不通的。

相关内容

最新更新

热门标签：