使用包pmdarima时获取【ValueError:输入包含NaN】



当我试图使用来自pmdarima的ARIMA模型来预测序列的下一个值时,我得到了错误ValueError: Input contains NaN

但是我使用的数据不包含空值

代码:

from pmdarima.arima import ARIMA
tmp_series = pd.Series([0.8867208063423082, 0.4969678051201152, -0.35079875681211814, 0.07156197743204402, 0.6888394890593726, 0.6136916470350972, 0.9020102952782968, 0.38539523911177426, -0.02211092685162178, 0.7051282791422511, -0.21841121961990842, 0.003262841037836234, 0.3970253153400027, 0.8187445259415379, -0.525847439014037, 0.3039480910711944, 0.0279240073596233, 0.8238419467739897, 0.8234157376839023, 0.5897892005398399, 0.8333118174945449])
model_211 = ARIMA(order=(2, 1, 1), out_of_sample_size=0, mle_regression=True, suppress_warnings=True)
model_211.fit(tmp_series[:-1])
print(model_211.predict())

错误消息:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [7], in <cell line: 7>()
5 display(model_211.params())
6 display(model_211.aic())
----> 7 display(model_211.predict())
File /usr/local/lib/python3.8/dist-packages/pmdarima/arima/arima.py:793, in ARIMA.predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
790 arima = self.arima_res_
791 end = arima.nobs + n_periods - 1
--> 793 f, conf_int = _seasonal_prediction_with_confidence(
794     arima_res=arima,
795     start=arima.nobs,
796     end=end,
797     X=X,
798     alpha=alpha)
800 if return_conf_int:
801     # The confidence intervals may be a Pandas frame if it comes from
802     # SARIMAX & we want Numpy. We will to duck type it so we don't add
803     # new explicit requirements for the package
804     return f, check_array(conf_int, force_all_finite=False)
File /usr/local/lib/python3.8/dist-packages/pmdarima/arima/arima.py:205, in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
202     conf_int[:, 1] = f + q * np.sqrt(var)
204 y_pred = check_endog(f, dtype=None, copy=False, preserve_series=True)
--> 205 conf_int = check_array(conf_int, copy=False, dtype=None)
207 return y_pred, conf_int
File /usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
893         raise ValueError(
894             "Found array with dim %d. %s expected <= 2."
895             % (array.ndim, estimator_name)
896         )
898     if force_all_finite:
--> 899         _assert_all_finite(
900             array,
901             input_name=input_name,
902             estimator_name=estimator_name,
903             allow_nan=force_all_finite == "allow-nan",
904         )
906 if ensure_min_samples > 0:
907     n_samples = _num_samples(array)
File /usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
124         if (
125             not allow_nan
126             and estimator_name
(...)
130             # Improve the error message on how to handle missing values in
131             # scikit-learn.
132             msg_err += (
133                 f"n{estimator_name} does not accept missing values"
134                 " encoded as NaN natively. For supervised learning, you might want"
(...)
144                 "#estimators-that-handle-nan-values"
145             )
--> 146         raise ValueError(msg_err)
148 # for object dtype data, we only check for NaNs (GH-13254)
149 elif X.dtype == np.dtype("object") and not allow_nan:
ValueError: Input contains NaN.

所以,我有两个问题:

  1. 为了避免这个错误,我是否应该设置任何参数

  2. 我发现了类似的问题:无法使用ARIMA预测下一个值:输入包含NaN、无穷大或对dtype来说太大的值(';float64';(在这篇文章的评论中说:这是由一个未解决的问题引起的

    我不确定这个错误是否也是由相同的问题引起的如果是,是否有其他ARIMA模型包的建议


环境信息:

  • 我在docker容器中执行此代码
    • 操作系统信息:
      Distributor ID: Ubuntu
      Description:    Ubuntu 20.04.4 LTS
      Release:        20.04
      Codename:       focal
      
    • python环境信息:
      Python 3.8.10
      
    • pip包信息(我只列出相关的包,我把完整的pip包列表放在这里(:
      Package                      Version                                                                            
      ---------------------------- --------------------                                                                        
      numpy                        1.22.4
      pandas                       1.4.3 
      pmdarima                     2.0.1   
      scikit-learn                 1.1.1                           
      scipy                        1.8.1
      statsmodels                  0.13.2 
      

您在什么环境中工作?你的代码打印(工作(:

20 0.31694221 0.33824822 0.378482…

降级以下包将解决此错误:

numpy==1.19.3
pandas==1.3.3
pmdarima==1.8.3

最新更新