Scikit学习RandomForest分类器错误



我使用的是Python 3.5,安装并导入了NumPy、SciPy和matplotlib。

当我尝试时:

# Import the random forest package
from sklearn.ensemble import RandomForestClassifier
# Create the random forest object which will include all the parameters
# for the fit
forest = RandomForestClassifier(n_estimators = 1)
# Fit the training data to the Survived labels and create the decision trees
forest = forest.fit(train_data[0::,1::],train_data[0::,0])
# Take the same decision trees and run it on the test data
output = forest.predict(test_data)

(test_data和train_data都是浮点数组)我得到以下错误:

C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnutilsfixes.py:64: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  if 'order' in inspect.getargspec(np.copy)[0]:
C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnbase.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnbase.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnbase.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnbase.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
Traceback (most recent call last):
  File "C:/Users/Uri/PycharmProjects/titanic1/fdsg.py", line 54, in <module>
    output = forest.predict(test_data)
  File "C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnensembleforest.py", line 461, in predict
    X = check_array(X, ensure_2d=False, accept_sparse="csr")
  File "C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnutilsvalidation.py", line 352, in check_array
    _assert_all_finite(array)
  File "C:UsersUriAppDataLocalProgramsPythonPython35-32libsite-packagessklearnutilsvalidation.py", line 52, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Process finished with exit code 1
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import Imputer
import numpy as np
X = np.random.randint(0, (2**31)-1, (500, 4)).astype(object)
y = np.random.randint(0, 2, 500)
clf = RandomForestClassifier()
print(X.max())
clf.fit(X, y) # OK
print("First fit OK")
# 1 - First case your data has null values
X[0,0] = np.nan # replaces of of the cells by a null value
#clf.fit(X, y) # gives you the same error
# to solve NAN values you can use the Imputer class:
imp = Imputer(strategy='median')
X_ok = imp.fit_transform(X)
clf.fit(X_ok, y)
# 2 - Second case your data has huge integers
X[0,0] = 2**128 # the same happens if you have a huge integer
#clf.fit(X, y) # gives you the same error
# to solve this you can clip your values to some cap
X_ok = X.clip(-2**63, 2**63) # I used 2**63 for example, but you should realize what makes sense to your application
clf.fit(X_ok, y)

相关内容

  • 没有找到相关文章

最新更新