使用scikit-hts在python中进行分层预测会导致无效频率错误



我正在使用scikit hts。下面是一些带有小df和层次结构的代码,让我们开始(在pip安装scikit-hts[auto_arima]之后(:

import hts
import pandas as pd
hierarchy_df = hierarchy_df_test = pd.DataFrame({'date':['1998-01-01', '1998-02-01', '1998-03-01', '1998-04-01', '1998-05-01', '1998-06-01', '1998-07-01', '1998-08-01', '1998-09-01', '1998-10-01', '1998-11-01', '1998-12-01', '1999-01-01', '1999-02-01'], 'total': [21, 40, 31, 21, 29, 40, 30, 21, 24, 30, 40, 22, 32, 32], 'A':[10,20,15,10,14,20,16,10,12,16,20,10,18,16], 'B':[11,20,16,11,15,20,14,11,12,14,20,12,14,16]})
hierarchy = {'total': ['A', 'B']}

我想将日期转换为datetime对象,所以我运行

hierarchy_df['date'] = pd.to_datetime(hierarchy_df['date'], format='%Y-%m-%d')

现在我用auto_arima和"OLS"作为修订方法来做模型拟合:

model_ols_arima = hts.HTSRegressor(model='auto_arima', revision_method='OLS', n_jobs=0)
model_ols_arima = model_ols_arima.fit(hierarchy_df, hierarchy)

一切都很顺利,直到我尝试预测:

pred_ols_arima = model_ols_arima.predict(steps_ahead=4)

此时,我得到一个"ValueError:无效频率:1"。

以下是完整的错误:

TypeError                                 Traceback (most recent call last)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _get_predict_index(self, steps_ahead)
369         try:
--> 370             start = self.nodes.item.index[-1] + timedelta(freq)
371             end = self.nodes.item.index[-1] + timedelta(steps_ahead * freq)
TypeError: unsupported operand type(s) for +: 'int' and 'datetime.timedelta'
During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)
<command-2268273946901309> in <module>
----> 1 pred_ols_arima = model_ols_arima.predict(steps_ahead=4)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in predict(self, exogenous_df, steps_ahead, distributor, disable_progressbar, show_warnings, **predict_kwargs)
350             self.hts_result.errors = (key, error)
351             self.hts_result.residuals = (key, residual)
--> 352         return self._revise(steps_ahead=steps_ahead)
353 
354     def _revise(self, steps_ahead: int = 1) -> pandas.DataFrame:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _revise(self, steps_ahead)
361 
362         revised_columns = list(make_iterable(self.nodes))
--> 363         revised_index = self._get_predict_index(steps_ahead=steps_ahead)
364         return pandas.DataFrame(revised, index=revised_index, columns=revised_columns)
365 
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _get_predict_index(self, steps_ahead)
374             start = self.nodes.item.index[-1] + freq
375             end = self.nodes.item.index[-1] + (steps_ahead * freq)
--> 376             future = pandas.date_range(freq=freq, start=start, end=end)
377 
378         return self.nodes.item.index.append(future)
/databricks/python/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in date_range(start, end, periods, freq, tz, normalize, name, closed, **kwargs)
1067         freq = "D"
1068 
-> 1069     dtarr = DatetimeArray._generate_range(
1070         start=start,
1071         end=end,
/databricks/python/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in _generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, closed)
375                 "and freq, exactly three must be specified"
376             )
--> 377         freq = to_offset(freq)
378 
379         if start is not None:
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()
ValueError: Invalid frequency: 1

我做了一些研究,问题似乎在于约会的频率,但我无法解决这个问题。这里的本教程或多或少地做了同样的事情(使用更大的数据集(,但没有错误。任何帮助都将不胜感激。谢谢

我通过将日期列设置为索引来解决这里的错误:

hierarchy_df = hierarchy_df.set_index('date')

完成此操作后,predict((行运行时没有出现任何错误。

最新更新