Jupyter 笔记本中的逻辑回归;输入包含 NaN、无穷大或对于 dtype('float64')来说太大的值



我想创建一个逻辑回归模型来预测关系是已知的还是未知的,我已经将数据集中的已知值设置为1,未知值设置为0。我还添加了几个特征来训练数据和预测关系。

我已经运行了这段代码:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn.datasets
from sklearn.model_selection import train_test_split
import pandas as pd
df = pd.read_csv('boo.csv')
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
y = pd.get_dummies(df.Relationship, prefix='Relationship')
X = pd.get_dummies(df, columns=['Relationship', 'Month','Year','Victim Age', 'Perpetrator 
Age', 'Victim Sex', 'Victim Race', 'Perpetrator Sex', 'Crime Type', 'Perpetrator Race'], drop_first = True )
X_train, X_test, y_train, y_test = train_test_split(df[['Month','Year','Victim Age', 'Perpetrator Age', 'Victim Sex', 'Victim Race', 'Perpetrator Sex', 'Crime Type', 'Perpetrator Race']], df.Relationship, test_size=0.1)
model = LogisticRegression()
np.isnan(X)
np.where(np.isnan(X))
np.nan_to_num(X)
model.fit(X, y)

我遇到这个错误:

ValueError                                Traceback (most recent call last)
<ipython-input-101-68355fc70ed4> in <module>
15 np.where(np.isnan(X))
16 np.nan_to_num(X)
---> 17 model.fit(X, y)
~anaconda3libsite-packagessklearnlinear_model_logistic.py in fit(self, X, y, sample_weight)
1342             _dtype = [np.float64, np.float32]
1343 
-> 1344         X, y = self._validate_data(X, y, accept_sparse='csr', dtype=_dtype,
1345                                    order="C",
1346                                    accept_large_sparse=solver != 'liblinear')

~anaconda3libsite-packagessklearnbase.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
431                 y = check_array(y, **check_y_params)
432             else:
--> 433                 X, y = check_X_y(X, y, **check_params)
434             out = X, y
435 
~anaconda3libsite-packagessklearnutilsvalidation.py in inner_f(*args, **kwargs)
61             extra_args = len(args) - len(all_args)
62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
64 
65             # extra_args > 0
~anaconda3libsite-packagessklearnutilsvalidation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
812         raise ValueError("y cannot be None")
813 
--> 814     X = check_array(X, accept_sparse=accept_sparse,
815                     accept_large_sparse=accept_large_sparse,
816                     dtype=dtype, order=order, copy=copy,
~anaconda3libsite-packagessklearnutilsvalidation.py in inner_f(*args, **kwargs)
61             extra_args = len(args) - len(all_args)
62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
64 
65             # extra_args > 0
~anaconda3libsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
661 
662         if force_all_finite:
--> 663             _assert_all_finite(array,
664                                allow_nan=force_all_finite == 'allow-nan')
665 
~anaconda3libsite-packagessklearnutilsvalidation.py in _assert_all_finite(X, allow_nan, msg_dtype)
101                 not allow_nan and not np.isfinite(X).all()):
102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
104                     msg_err.format
105                     (type_err,
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

我查看了几个stackoverflow问题并尝试了他们的解决方案,但似乎没有任何工作。只有当我尝试在模型中拟合X和y时,才会出现此错误。

您需要修复数据集中的'nan'值。您正在检查nan值,但没有修复它。以下是示例数据。

X = np.concatenate((np.arange(1,15),[np.NaN,np.NaN]))
print(np.isnan(X), np.where(np.isnan(X)), np.nan_to_num(X))

输出:

[False False False False False False False False False False False False
False False  True  True] 
(array([14, 15]),) 
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14.  0.  0.]

您需要执行下一步并将其分配回去。代码不做"原地"赋值。如果你不赋值它,原始数组就不会发生变化。

X = np.nan_to_num(X) # assign it

现在,如果你再次检查X,就不会有'nan'值,然后继续训练模型。

相关内容

最新更新