由于NaN/inf或dtype,我在sklearn中得到一个ValueError.我查过数据,没有发现任何错误



我试图将一些数据传递给sklearn,但我只得到错误"包含NaN/inf或错误的dtype"。

我得到的数据主要是从csv文件中得到的天气数据,并合并到panda中。

我已经检查了NaN和inf值以及错误的数据类型。

np.isfinite(X_train).all()
True
np.any(np.isnan(X_train))
False
X_train.dtype
dtype('float64')

我还试着只使用一些列和行来缩小它的范围,但即使只使用任何两列和一些行,我也会遇到同样的错误。

在将pandas数据帧传递给sklearn之前,我已经将其转换为numpy,并尝试对其进行重新索引。我已将其写入csv文件,并检查其是否存在奇怪的条目。我能找到的所有解决方案都不适合我。

我尝试使用的代码是:

X = climate.drop(columns=['e_bins'])
y = climate.e_bins
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy_score(y_test, y_pred)

我正在研究这个问题一段时间,现在已经取得了成功。感谢任何帮助或想法!谢谢

这就是我收到的全部错误:

ValueError                                Traceback (most recent call last)
<ipython-input-197-14fc949c5576> in <module>
17 X_test = scaler.transform(X_test)
18 knn = neighbors.KNeighborsClassifier(n_neighbors=5)
---> 19 knn.fit(X_train, y_train)
20 #y_pred = knn.predict(X_test)
21 #accuracy_score(y_test, y_pred)
~AppDataLocalContinuumanaconda3libsite-packagessklearnneighborsbase.py in fit(self, X, y)
890         """
891         if not isinstance(X, (KDTree, BallTree)):
--> 892             X, y = check_X_y(X, y, "csr", multi_output=True)
893 
894         if y.ndim == 1 or y.ndim == 2 and y.shape[1] == 1:
~AppDataLocalContinuumanaconda3libsite-packagessklearnutilsvalidation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
720     if multi_output:
721         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
--> 722                         dtype=None)
723     else:
724         y = column_or_1d(y, warn=True)
~AppDataLocalContinuumanaconda3libsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
540         if force_all_finite:
541             _assert_all_finite(array,
--> 542                                allow_nan=force_all_finite == 'allow-nan')
543 
544     if ensure_min_samples > 0:
~AppDataLocalContinuumanaconda3libsite-packagessklearnutilsvalidation.py in _assert_all_finite(X, allow_nan)
54                 not allow_nan and not np.isfinite(X).all()):
55             type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 56             raise ValueError(msg_err.format(type_err, X.dtype))
57     # for object dtype data, we only check for NaNs (GH-13254)
58     elif X.dtype == np.dtype('object') and not allow_nan:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

我在做特性重要性时也遇到了这个错误,我在下面找到了具有无限值的cols。请检查他们是否可以帮助

# getting the cols which have infinite values 
col_name = train_df.columns.to_series()[np.isinf(train_df).any()]
print(col_name)

相关内容

最新更新