Pytorch Forecasting:加载自定义数据集



我试图通过修改这个Github存储库中给出的示例来加载一个自定义数据集到PyTorch Forecasting。然而,我被困在实例化TimeSeriesDataSet。代码的相关部分如下:

import numpy as np
import pandas as pd
df = pd.read_csv("data.csv")
print(df.shape) # (300, 8)
# Divide the timestamps so that they are incremented by one each row.
df["unix"] = df["unix"].apply(lambda n: int(n / 86400))
# Set "unix" as the index
#df = df.set_index("unix")
# Add *integer* indices.
df["index"] = np.arange(300)
df = df.set_index("index")
# Add group column.
df["group"] = np.repeat(np.arange(30), 10)
from pytorch_forecasting import TimeSeriesDataSet
target = ["foo", "bar", "baz"]
# Create the dataset from the pandas dataframe
dataset = TimeSeriesDataSet(
df,
group_ids                  = ["group"],
target                     = target,
time_idx                   = "unix",
min_encoder_length         = 50,
max_encoder_length         = 50,
min_prediction_length      = 20,
max_prediction_length      = 20,
time_varying_unknown_reals = target,
allow_missing_timesteps    = True
)

和错误信息加上回溯:

/home/user/.virtualenvs/torch/lib/python3.9/site-packages/pytorch_forecasting/data/timeseries.py:1241: UserWarning: Min encoder length and/or min_prediction_idx and/or min prediction length and/or lags are too large for 30 series/groups which therefore are not present in the dataset index. This means no predictions can be made for those series. First 10 removed groups: [{'__group_id__group': 0}, {'__group_id__group': 1}, {'__group_id__group': 2}, {'__group_id__group': 3}, {'__group_id__group': 4}, {'__group_id__group': 5}, {'__group_id__group': 6}, {'__group_id__group': 7}, {'__group_id__group': 8}, {'__group_id__group': 9}]
warnings.warn(
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_822/3402560775.py in <module>
4 
5 # create the dataset from the pandas dataframe
----> 6 dataset = TimeSeriesDataSet(
7     df,
8     group_ids                  = ["group"],
~/.virtualenvs/torch/lib/python3.9/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, constant_fill_strategy, allow_missing_timesteps, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
437 
438         # create index
--> 439         self.index = self._construct_index(data, predict_mode=predict_mode)
440 
441         # convert to torch tensor for high performance data loading later
~/.virtualenvs/torch/lib/python3.9/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
1247                 UserWarning,
1248             )
-> 1249         assert (
1250             len(df_index) > 0
1251         ), "filters should not remove entries all entries - check encoder/decoder lengths and lags"
AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags

我试过调整初始化参数,但没有成功。文件timeseries.py可以在同一个Github存储库中找到,在这里。

据我所知,我猜这可能会发生,因为不是所有的时间序列都有最小长度(min_prediction_length+min_encoder_length)。

在你的例子中,每个时间序列至少应该有70的长度。

最新更新