Sklearn的规范化类的拟合函数如何表现?



根据文档,.fit函数:

fit(X,y=None(:不执行任何操作,返回估计量不变。

这种方法只是用来实现通常的API,从而在管道中工作。

然而,当我使用拟合函数时,Normalizer;适合";并期望在此后使用变换函数时具有相同数量的特征。

例如:

A = np.random.rand(1,7)
B = np.random.rand(1,5)
print("A :",A,"n","B :",B)
>>> A : [[0.56973872 0.74769087 0.81626309 0.03873601 0.71216399 0.31807755 0.96527768]] 
B : [[0.49805279 0.73939067 0.85949423 0.79824846 0.52750957]]
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
#normalizer.fit(A)
a = normalizer.transform(A)
b = normalizer.transform(B)
print("a :",a,"n","b :",b)
>>> a : [[0.32403221 0.42524041 0.46424006 0.02203065 0.40503491 0.18090287
0.54899035]] 
b : [[0.3182623  0.47248039 0.54922815 0.5100913  0.33708558]]

但是,当在此ValueError中调用拟合函数时,会引发:

from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
normalizer.fit(A)
a = normalizer.transform(A)
b = normalizer.transform(B)
print("a :",a,"n","b :",b)
>>> ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [28], in <module>
4 normalizer.fit(A)
5 a = normalizer.transform(A)
----> 6 b = normalizer.transform(B)
7 print("a :",a,"n","b :",b)
File ~PycharmProjectsnotebookWith_LSPvenvlibsite-packagessklearnpreprocessing_data.py:1948, in Normalizer.transform(self, X, copy)
1931 """Scale each non zero row of X to unit norm.
1932 
1933 Parameters
(...)
1945     Transformed array.
1946 """
1947 copy = copy if copy is not None else self.copy
-> 1948 X = self._validate_data(X, accept_sparse="csr", reset=False)
1949 return normalize(X, norm=self.norm, axis=1, copy=copy)
File ~PycharmProjectsnotebookWith_LSPvenvlibsite-packagessklearnbase.py:600, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
597     out = X, y
599 if not no_val_X and check_params.get("ensure_2d", True):
--> 600     self._check_n_features(X, reset=reset)
602 return out
File ~PycharmProjectsnotebookWith_LSPvenvlibsite-packagessklearnbase.py:400, in BaseEstimator._check_n_features(self, X, reset)
397     return
399 if n_features != self.n_features_in_:
--> 400     raise ValueError(
401         f"X has {n_features} features, but {self.__class__.__name__} "
402         f"is expecting {self.n_features_in_} features as input."
403     )
ValueError: X has 5 features, but Normalizer is expecting 7 features as input.

我到底错过了什么?

fit方法不学习函数,但它仍然验证调用self._validate_data(X)的数据。除非使用reset=False调用,否则验证函数默认情况下会阻止输入特征大小以确保以后的一致性。

请参阅https://github.com/scikit-learn/scikit-learn/blob/17df37aee774720212c27dbc34e6f1feef0e2482/sklearn/base.py

_validate_data功能中:

reset : bool, default=True
Whether to reset the `n_features_in_` attribute.
If False, the input will be checked for consistency with data
provided when reset was last True

不幸的是,normaliser.fit似乎没有转发关键字参数来验证数据。

最新更新