sklearn - 无法立即调用 MultiLabelBinarizer 的inverse_transform



在实例化一个MultiLabelBinarizer之后,我需要它inverse_transform方法用于我在其他地方构建的矩阵。不幸

import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=['a', 'b', 'c'])
A = np.array([[1, 0, 0], [1, 0, 1], [0, 1, 0], [1, 1, 1]])
y = mlb.inverse_transform(A)

产量AttributeError: 'MultiLabelBinarizer' object has no attribute 'classes_'

我注意到,如果我在mlb的实例化之后添加这一行,

mlb.fit_transform([(c,) for c in ['a', 'b', 'c']])

错误消失。我猜这是因为fit_transform设置了 classes_ 属性的值,但我希望它在实例化时完成,因为我提供了一个classes参数。

我正在使用 sklearn 版本 0.17.1 和 python 2.7.6。我做错了什么吗?

如果要在 MultiLabelBinarizer 的实例中设置属性classes_,也可以像这样快速破解:

mlb = MultiLabelBinarizer().fit(['a', 'b', 'c'])

因为就像marmouset说的那样,只有fitfit_transorm似乎符合classes_属性。此外,scikit-learn.org http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html 的文档明确指定该方法fit可以返回MultiLabelBinarizer的实例。

def fit(self, y):
    """Fit the label sets binarizer, storing `classes_`
    Parameters
    ----------
    y : iterable of iterables
        A set of labels (any orderable and hashable object) for each
        sample. If the `classes` parameter is set, `y` will not be
        iterated.
    Returns
    -------
    self : returns this MultiLabelBinarizer instance
    """

似乎是按原样实现的 https://github.com/scikit-learn/scikit-learn/blob/51a765a/sklearn/preprocessing/label.py#L636,.fit是定义classes_属性的唯一方法。 classes_ 没有定义为构造函数中类的副本,并且考虑到注释中给出的定义,它并不意味着如此;你可以警告作者。

class MultiLabelBinarizer(BaseEstimator, TransformerMixin):
    """Transform between iterable of iterables and a multilabel format
    Although a list of sets or tuples is a very intuitive format for multilabel
    data, it is unwieldy to process. This transformer converts between this
    intuitive format and the supported multilabel format: a (samples x classes)
    binary matrix indicating the presence of a class label.
    Parameters
    ----------
    classes : array-like of shape [n_classes] (optional)
        Indicates an ordering for the class labels
    sparse_output : boolean (default: False),
        Set to true if output binary array is desired in CSR sparse format
    Attributes
    ----------
    classes_ : array of labels
        A copy of the `classes` parameter where provided,
        or otherwise, the sorted set of classes found when fitting.

相关内容

  • 没有找到相关文章

最新更新