SUOD模型给出ValueError:输入包含Nan



我正在从pyod运行SUOD,这是集成方法,并收到此错误。我正在运行的模型是Iforest、COPOD和ECOD。

单独运行这些模型并不意味着数据中有nan值。此外,我已经验证了是否有任何列有nan,而它没有。数据是一个热编码

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:  1.0min remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:  1.0min finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    5.8s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    5.8s finished
Traceback (most recent call last):
File "ensemble.py", line 76, in <module>
clf.fit(x_train_scaled)
File "/home/ubuntu/thesis/lib/python3.8/site-packages/pyod/models/suod.py", line 220, in fit
decision_score_mat, self.score_scalar_ = standardizer(
File "/home/ubuntu/thesis/lib/python3.8/site-packages/pyod/utils/utility.py", line 152, in standardizer
X = check_array(X)
File "/home/ubuntu/thesis/lib/python3.8/site-packages/sklearn/utils/validation.py", line 919, in check_array
_assert_all_finite(
File "/home/ubuntu/thesis/lib/python3.8/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input contains NaN.

这是我的代码

train_data.dropna(axis=0)
test1_data.dropna(axis=0)
test2_data.dropna(axis=0)
mm_scaler = MinMaxScaler()
x_train_scaled = mm_scaler.fit_transform(train_data)
x_test2_scaled = mm_scaler.transform(test2_data)
x_test1_scaled = mm_scaler.transform(test1_data)
detector_list = [COPOD(), IForest(n_estimators=100,max_samples=10000, max_features=10,
bootstrap=True, n_jobs=-1, random_state=42),
IForest(n_estimators=200,max_samples=10000, max_features=10,
bootstrap=True, n_jobs=-1, random_state=42), ECOD(contamination=0.001)]
clf = SUOD(base_estimators=detector_list, n_jobs=2, combination='average',
verbose=False)

clf.fit(x_train_scaled)

train_pred = clf.predict(x_train_scaled)
test_pred1 = clf.predict(x_test1_scaled)
test_pred2 = clf.predict(x_test2_scaled)

我试过的东西

  1. 简单脉冲
  2. 放弃nan行
  3. 添加模拟补丁

作为错误输出,您需要处理NaN值。dropna方法返回一个新的数据帧。如果您想修改它,请将参数inplace设置为true,并在原地进行操作(返回None),

inplace:布尔值,默认错误

因此要在data.dropna(axis=0, how='any', inplace=True)中对其进行修改

处理NaN值的另一种可能方法(这是可选的,如果适用于您的问题或与数据挖掘相关的问题)是将NaN输入转换为列df = df.fillna(df.mean())的中值。

另一种情况(不常见)是,您的数据帧包含以字符串类型或"NaN"表示的nan值,那么管理nan值的函数将无法工作,在这种情况下,您需要使用df.replace("NaN", numpy.nan)和drop之类的东西。

最新更新