Scikit-Learn问题的RandomForestClassifier多标签分类-锯齿数组错误

Scikit-Learn RandomForestClassifier在多标签分类问题中抛出错误。

这段代码创建了一个RandomForestClassifier多标签对象，给定预测器C和多标签out，没有错误。

C = np.array([[2,4,6],[4,2,1],[8,3,1]])
out = np.array([[0,1],[0,1],[1,0]])
rf = RandomForestClassifier(n_estimators=100, oob_score=True)
rf.fit(C,out)

如果我修改multilabels，使所有元素在一个特定的索引是相同的，说(其中multilabels的所有第一个组件等于零)

out = np.array([[0,1],[0,1],[0,0]])

我得到一个错误和回溯:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a 
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. 
If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
y_pred = np.array(y_pred, copy=False)
raise ValueError(
507             "The type of target cannot be used to compute OOB "
508             f"estimates. Got {y_type} while only the following are "
509             "supported: continuous, continuous-multioutput, binary, "
510             "multiclass, multilabel-indicator."
511         )
ValueError: could not broadcast input array from shape (2,1) into shape (2,)

不请求OOB预测不会导致错误:

rf_err = RandomForestClassifier(n_estimators=100, oob_score=False)

我不明白为什么保留OOB预测会触发这样的错误，当一个多标签的所有n个分量都相等时。

在您的设置out_err = np.array([[0,1],[0,1],[0,0]])中，您没有任何第二类的示例，因此您只有1类的元素。

这意味着没有"类标签"维度，它可以被省略。这就是为什么你看到(2,)形状。

请描述你最初的意图:为什么你需要将标签中的特定位置设置为0。如果您尝试使用N-1类而不是N类，我建议从数据集中删除位置本身和类的元素，而不是将所有的零:

out=[[1,0,0],[0,1,0],[0,1,0],[0,0,1],[1,0,0]]  # 3 classes
# remove the second class:
out=[[1,0],[0,1],[1,0]]  # 2 classes

相关内容

最新更新

热门标签：