通过重复/缩放现有值将时间序列数据外推到未来



我有某一天每小时的用电量数据。我想用这些数据来"预测"接下来几天每小时的用电量。第二天的数值应为前一天同一小时的数值,乘以比例因子f(例如2)。

数据帧df看起来像这样:

load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0

我希望输出数据帧df_ex看起来像这样:

load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0
2021-01-02 00:00:00   2.0
2021-01-02 01:00:00   2.0
2021-01-02 02:00:00   2.0
2021-01-02 03:00:00   2.0
2021-01-02 04:00:00   2.0
2021-01-02 05:00:00   2.0
2021-01-02 06:00:00   2.0
2021-01-02 07:00:00   6.0
2021-01-02 08:00:00   6.0
2021-01-02 09:00:00   6.0
2021-01-02 10:00:00   6.0
2021-01-02 11:00:00   6.0
2021-01-02 12:00:00   6.0
2021-01-02 13:00:00   6.0
2021-01-02 14:00:00   6.0
2021-01-02 15:00:00   6.0
2021-01-02 16:00:00   6.0
2021-01-02 17:00:00   6.0
2021-01-02 18:00:00   6.0
2021-01-02 19:00:00   6.0
2021-01-02 20:00:00   2.0
2021-01-02 21:00:00   2.0
2021-01-02 22:00:00   2.0
2021-01-02 23:00:00   2.0
2021-01-03 00:00:00   4.0
2021-01-03 01:00:00   4.0
2021-01-03 02:00:00   4.0
2021-01-03 03:00:00   4.0
2021-01-03 04:00:00   4.0
2021-01-03 05:00:00   4.0
2021-01-03 06:00:00   4.0
2021-01-03 07:00:00   12.0
2021-01-03 08:00:00   12.0
2021-01-03 09:00:00   12.0
2021-01-03 10:00:00   12.0
2021-01-03 11:00:00   12.0
2021-01-03 12:00:00   12.0
2021-01-03 13:00:00   12.0
2021-01-03 14:00:00   12.0
2021-01-03 15:00:00   12.0
2021-01-03 16:00:00   4.0
2021-01-03 17:00:00   4.0
2021-01-03 18:00:00   4.0
2021-01-03 19:00:00   4.0
2021-01-03 20:00:00   4.0
2021-01-03 21:00:00   4.0
2021-01-03 22:00:00   4.0
2021-01-03 23:00:00   4.0

我尝试了以下解决方案(df如上所定义):

import pandas as pd
import datetime
start = '2021-01-01 00:00'
end = '2021-01-03 23:00'
freq = 'H'
index = pd.date_range(start,
end,
freq=freq)
df_ex = df.reindex(index)
i = df_ex.index[0].day
f = 2.0
df_ex.loc[df_ex.index.day == i+1] = df_ex.loc[df_ex.index.day == i] * f
print(df_ex)

结果是:

load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
...                   ...
2021-01-03 19:00:00   NaN
2021-01-03 20:00:00   NaN
2021-01-03 21:00:00   NaN
2021-01-03 22:00:00   NaN
2021-01-03 23:00:00   NaN

看来我试图在第一天之后用值填充行没有成功。索引是一个DateTimeIndex。

任何关于如何解决这个问题的建议将非常感谢!

要创建数据,您需要一次迭代一天。

假设原始数据至少有一整天的数据,那么你可以这样做:

import pandas as pd
import itertools
import datetime as dt
start = "2021-01-01 00:00"
end = "2021-01-01 23:00"
freq = "H"
df = pd.DataFrame(
{"load_kWh": itertools.chain([1.0] * 7, [3.0] * 13, [1.0] * 4)},
index=pd.date_range(start, end, freq=freq),
)

def add_days_to_df(data: pd.DataFrame, number_of_days: int, k: float) -> pd.DataFrame:
data = data.copy()
for _ in range(number_of_days):
day = data[-24:]
day.index += dt.timedelta(days=1)
day *= k
data = pd.concat((data, day))
return data

print(add_days_to_df(data=df, number_of_days=2, k=2.0))

我设法得到了一个部分解决方案,它可以工作多年而不是几天(从前一年的同一天复制/缩放数据)。这只是一个部分的解决方案,因为闰年还没有被考虑在内。

def add_years_to_df(data: pd.DataFrame, target_year: int, k: float) -> pd.DataFrame:
base_year = data.index[0].year
i = base_year+1
add = data.copy()
for _ in range (base_year+1, target_year+1):
add = k*add
add.index = add.index.map(lambda t: t.replace(year=i))
data = pd.concat((data, add))
i += 1
return data

lp是我最初问题中所述的输入数据框。target_year是数据外推的年份。k是乘数

调用函数,输入例如:add_years_to_df(data=lp, target_year = 2030, k=1.1)

结果是:

Datetime                load_kWh
2021-01-01 00:00:00     77.987500
2021-01-01 01:00:00     78.116667
2021-01-01 02:00:00     79.383333
2021-01-01 03:00:00     79.070833
2021-01-01 04:00:00     78.275000
...     ...
2030-12-31 19:00:00     247.373361
2030-12-31 20:00:00     74.889393
2030-12-31 21:00:00     71.883018
2030-12-31 22:00:00     73.101291
2030-12-31 23:00:00     72.438118

最新更新