无法使用新的观测值更新统计模型 SARIMAX(值错误)



我正试图使用SciKitLearn的TimeSeriesSplit()在时间序列数据集上用完样本验证,以创建训练/测试折叠。

其想法是在列车折叠上训练Statsmodel的SARIMAX,然后在不改装模型的情况下在测试折叠上进行验证。要做到这一点,我们必须在预测之前,一次一个地迭代地将测试中的新观察结果附加到模型中。

但是,我在附加步骤中得到一个ValueError:ValueError: Given `endog` does not have an index that extends the index of the model.

这对我来说毫无意义。如果我为每个折叠打印出print(max(train_fold.index), min(test_fold.index)),很明显,序列折叠的最后一个索引低于测试折叠的第一个索引。就我而言:

1983-05 1983-06
1984-05 1984-06
1985-05 1985-06
1986-05 1986-06
1987-05 1987-06

这是目前的完整代码。我确信我在做一些愚蠢的事情,但我被卡住了:

# Create a generator that yields the indices of our train and test folds
split = TimeSeriesSplit(n_splits=5).split(train_series)
# Loop through each fold
for train_idcs, test_idcs in split:
# Create an empty prediction list to append to
predictions = []
# Create the folds
train_fold = train_series[train_idcs]
test_fold = train_series[test_idcs]
# Fit the model on the training fold
model_instance = sm.tsa.statespace.SARIMAX(
train_fold,
order=(1, 0, 0),
seasonal_order=(1, 0, 0, 12),
simple_differencing=True,
enforce_stationarity=False,
enforce_invertibility=False,
)
model_fitted = model_instance.fit(disp=False)
# Create the initial prediction
pred = model_fitted.forecast(steps=1)[
0
]  # Slice so we just get the forecast value only
predictions.append(pred)
# Now loop through the test set, adding observations individually,
# and getting the next prediction
for i in range(len(test_fold)):
# Get the next row
next_row = test_fold.iloc[
i : i + 1
]  # Returns single row but in series form (which statsmodels expects)
# Append the row to the model
model_fitted.append(next_row, refit=False)
# Get the new prediction
pred = model_fitted.forecast(steps=1)[
0
]  # Slice so we just get the forecast value only
predictions.append(pred)
print(predictions)

model_fitted.append(next_row, refit=False)是故障点。有什么想法吗?谢谢

明白了!这太傻了。

SARIMAX模型的.append()方法返回模型本身,而不是更改存储在模型中的数据。

因此,正确的代码是simply:model_fitted = model_fitted.append(next_row, refit=False)

最新更新