sklearn's PLSRegression： "ValueError: array must not contain infs or NaNs"

使用sklearn.cross_decomposition.PLSRegression:时

import numpy as np
import sklearn.cross_decomposition
pls2 = sklearn.cross_decomposition.PLSRegression()
xx = np.random.random((5,5))
yy = np.zeros((5,5) ) 
yy[0,:] = [0,1,0,0,0]
yy[1,:] = [0,0,0,1,0]
yy[2,:] = [0,0,0,0,1]
#yy[3,:] = [1,0,0,0,0] # Uncommenting this line solves the issue
pls2.fit(xx, yy)

我得到：

C:Anacondalibsite-packagessklearncross_decompositionpls_.py:44: RuntimeWarning: invalid value encountered in divide
  x_weights = np.dot(X.T, y_score) / np.dot(y_score.T, y_score)
C:Anacondalibsite-packagessklearncross_decompositionpls_.py:64: RuntimeWarning: invalid value encountered in less
  if np.dot(x_weights_diff.T, x_weights_diff) < tol or Y.shape[1] == 1:
C:Anacondalibsite-packagessklearncross_decompositionpls_.py:67: UserWarning: Maximum number of iterations reached
  warnings.warn('Maximum number of iterations reached')
C:Anacondalibsite-packagessklearncross_decompositionpls_.py:297: RuntimeWarning: invalid value encountered in less
  if np.dot(x_scores.T, x_scores) < np.finfo(np.double).eps:
C:Anacondalibsite-packagessklearncross_decompositionpls_.py:275: RuntimeWarning: invalid value encountered in less
  if np.all(np.dot(Yk.T, Yk) < np.finfo(np.double).eps):
Traceback (most recent call last):
  File "C:svnhw4codetest_plsr2.py", line 8, in <module>
    pls2.fit(xx, yy)
  File "C:Anacondalibsite-packagessklearncross_decompositionpls_.py", line 335, in fit
    linalg.pinv(np.dot(self.x_loadings_.T, self.x_weights_)))
  File "C:Anacondalibsite-packagesscipylinalgbasic.py", line 889, in pinv
    a = _asarray_validated(a, check_finite=check_finite)
  File "C:Anacondalibsite-packagesscipy_lib_util.py", line 135, in _asarray_validated
    a = np.asarray_chkfinite(a)
  File "C:Anacondalibsite-packagesnumpylibfunction_base.py", line 613, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

可能是什么问题？

我知道scikit-learn GitHub问题#2089，但由于我使用scikit-learn 0.16.1（使用Python 2.7.10 x64），这个问题应该得到解决（GitHub中提到的代码片段工作正常）。

请检查传入的值是否为NaN或inf:

np.isnan(xx).any()
np.isnan(yy).any()
np.isinf(xx).any()
np.isinf(yy).any()

如果这些结果中有任何一个是真的。删除nan条目或inf条目。例如，您可以使用将它们设置为0

xx = np.nan_to_num(xx)
yy = np.nan_to_num(yy)

numpy也有可能被输入如此大的正值、负值和零值，以至于库中的方程会产生零，Nan或Inf。奇怪的是，一种变通方法是发送较小的数字（比如-1到1之间的代表性数字）。实现这一点的一种方法是通过标准化，请参阅：https://stackoverflow.com/a/36390482/445131

如果这些都不能解决问题，那么你可能正在处理你使用的库中的一个低级错误，或者你的数据中的某种奇异性。创建一个ssce并将其发布到stackoverflow，或者在维护软件的库中创建一个新的错误报告。

该问题是由scikit-learn中的一个错误引起的。我在GitHub上报道过：https://github.com/scikit-learn/scikit-learn/issues/2089#issuecomment-152753095

我发现了一个对我有用的棘手的小解决方案

我用代码通过铯进行时间序列分析

timeInput = np.array(timeData)
valueInput = np.array(data)
#Featurizing Data
featurizedData = featurize.featurize_time_series(times=timeInput,
                                                     values=valueInput,
                                                     errors=None,
                                                     features_to_use=featuresToUse)

这导致了这个错误：

ValueError: array must not contain infs or NaNs

为了搞笑，我检查了数据的长度和类型：

data:
70
<class 'numpy.int32'>
timeData: 
70
<class 'numpy.float64'>

我决定尝试用这一行代码转换数据类型：

valueInput = valueInput.astype(float)

它成功了，产生了这个代码：

timeInput = np.array(timeData)
valueInput = np.array(data)
valueInput = valueInput.astype(float)
#Featurizing Data
try:
    featurizedData = featurize.featurize_time_series(times=timeInput,
                                                     values=valueInput,
                                                     errors=None,
                                                     features_to_use=featuresToUse)

如果您遇到这样的错误，请给匹配的数据类型一个快照

我可以重现同样的错误，我通过过滤掉所有0的来消除这个错误

threshold_for_bug = 0.00000001 # could be any value, ex numpy.min
xx[xx < threshold_for_bug] = threshold_for_bug

这使错误静音（我从不检查精度差异）

我的系统信息：

numpy-1.11.2
python-3.5
macOS Sierra

您可能需要检查权重是否为负值，因为负权重也会触发此错误。

我在使用PRINCE库进行MCA研究时遇到了类似的问题。我的解决方案是使用"；对象"；dtype而不是"dtype"；类别"；。非常令人沮丧，因为我花了很多小时来寻找解决方案。

相关内容

最新更新

热门标签：