Sklearn的规范化类的拟合函数如何表现？

根据文档，.fit函数：

fit(X，y=None(：不执行任何操作，返回估计量不变。

这种方法只是用来实现通常的API，从而在管道中工作。

然而，当我使用拟合函数时，Normalizer；适合"；并期望在此后使用变换函数时具有相同数量的特征。

例如：

A = np.random.rand(1,7)
B = np.random.rand(1,5)
print("A :",A,"n","B :",B)
>>> A : [[0.56973872 0.74769087 0.81626309 0.03873601 0.71216399 0.31807755 0.96527768]] 
B : [[0.49805279 0.73939067 0.85949423 0.79824846 0.52750957]]
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
#normalizer.fit(A)
a = normalizer.transform(A)
b = normalizer.transform(B)
print("a :",a,"n","b :",b)
>>> a : [[0.32403221 0.42524041 0.46424006 0.02203065 0.40503491 0.18090287
0.54899035]] 
b : [[0.3182623  0.47248039 0.54922815 0.5100913  0.33708558]]

但是，当在此ValueError中调用拟合函数时，会引发：

from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
normalizer.fit(A)
a = normalizer.transform(A)
b = normalizer.transform(B)
print("a :",a,"n","b :",b)
>>> ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [28], in <module>
4 normalizer.fit(A)
5 a = normalizer.transform(A)
----> 6 b = normalizer.transform(B)
7 print("a :",a,"n","b :",b)
File ~PycharmProjectsnotebookWith_LSPvenvlibsite-packagessklearnpreprocessing_data.py:1948, in Normalizer.transform(self, X, copy)
1931 """Scale each non zero row of X to unit norm.
1932 
1933 Parameters
(...)
1945     Transformed array.
1946 """
1947 copy = copy if copy is not None else self.copy
-> 1948 X = self._validate_data(X, accept_sparse="csr", reset=False)
1949 return normalize(X, norm=self.norm, axis=1, copy=copy)
File ~PycharmProjectsnotebookWith_LSPvenvlibsite-packagessklearnbase.py:600, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
597     out = X, y
599 if not no_val_X and check_params.get("ensure_2d", True):
--> 600     self._check_n_features(X, reset=reset)
602 return out
File ~PycharmProjectsnotebookWith_LSPvenvlibsite-packagessklearnbase.py:400, in BaseEstimator._check_n_features(self, X, reset)
397     return
399 if n_features != self.n_features_in_:
--> 400     raise ValueError(
401         f"X has {n_features} features, but {self.__class__.__name__} "
402         f"is expecting {self.n_features_in_} features as input."
403     )
ValueError: X has 5 features, but Normalizer is expecting 7 features as input.

我到底错过了什么？

fit方法不学习函数，但它仍然验证调用self._validate_data(X)的数据。除非使用reset=False调用，否则验证函数默认情况下会阻止输入特征大小以确保以后的一致性。

请参阅https://github.com/scikit-learn/scikit-learn/blob/17df37aee774720212c27dbc34e6f1feef0e2482/sklearn/base.py

在_validate_data功能中：

reset : bool, default=True
Whether to reset the `n_features_in_` attribute.
If False, the input will be checked for consistency with data
provided when reset was last True

不幸的是，normaliser.fit似乎没有转发关键字参数来验证数据。

相关内容

最新更新

热门标签：