我正在从pyod运行SUOD,这是集成方法,并收到此错误。我正在运行的模型是Iforest、COPOD和ECOD。
单独运行这些模型并不意味着数据中有nan值。此外,我已经验证了是否有任何列有nan,而它没有。数据是一个热编码
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 1.0min remaining: 0.0s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 1.0min finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 5.8s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 5.8s finished
Traceback (most recent call last):
File "ensemble.py", line 76, in <module>
clf.fit(x_train_scaled)
File "/home/ubuntu/thesis/lib/python3.8/site-packages/pyod/models/suod.py", line 220, in fit
decision_score_mat, self.score_scalar_ = standardizer(
File "/home/ubuntu/thesis/lib/python3.8/site-packages/pyod/utils/utility.py", line 152, in standardizer
X = check_array(X)
File "/home/ubuntu/thesis/lib/python3.8/site-packages/sklearn/utils/validation.py", line 919, in check_array
_assert_all_finite(
File "/home/ubuntu/thesis/lib/python3.8/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input contains NaN.
这是我的代码
train_data.dropna(axis=0)
test1_data.dropna(axis=0)
test2_data.dropna(axis=0)
mm_scaler = MinMaxScaler()
x_train_scaled = mm_scaler.fit_transform(train_data)
x_test2_scaled = mm_scaler.transform(test2_data)
x_test1_scaled = mm_scaler.transform(test1_data)
detector_list = [COPOD(), IForest(n_estimators=100,max_samples=10000, max_features=10,
bootstrap=True, n_jobs=-1, random_state=42),
IForest(n_estimators=200,max_samples=10000, max_features=10,
bootstrap=True, n_jobs=-1, random_state=42), ECOD(contamination=0.001)]
clf = SUOD(base_estimators=detector_list, n_jobs=2, combination='average',
verbose=False)
clf.fit(x_train_scaled)
train_pred = clf.predict(x_train_scaled)
test_pred1 = clf.predict(x_test1_scaled)
test_pred2 = clf.predict(x_test2_scaled)
我试过的东西
- 简单脉冲
- 放弃nan行
- 添加模拟补丁
作为错误输出,您需要处理NaN值。dropna
方法返回一个新的数据帧。如果您想修改它,请将参数inplace
设置为true,并在原地进行操作(返回None
),
inplace:布尔值,默认错误
因此要在data.dropna(axis=0, how='any', inplace=True)
中对其进行修改
处理NaN
值的另一种可能方法(这是可选的,如果适用于您的问题或与数据挖掘相关的问题)是将NaN
输入转换为列df = df.fillna(df.mean())
的中值。
另一种情况(不常见)是,您的数据帧包含以字符串类型或"NaN"
表示的nan值,那么管理nan值的函数将无法工作,在这种情况下,您需要使用df.replace("NaN", numpy.nan)
和drop之类的东西。