我有某一天每小时的用电量数据。我想用这些数据来"预测"接下来几天每小时的用电量。第二天的数值应为前一天同一小时的数值,乘以比例因子f
(例如2)。
数据帧df
看起来像这样:
load_kWh
2021-01-01 00:00:00 1.0
2021-01-01 01:00:00 1.0
2021-01-01 02:00:00 1.0
2021-01-01 03:00:00 1.0
2021-01-01 04:00:00 1.0
2021-01-01 05:00:00 1.0
2021-01-01 06:00:00 1.0
2021-01-01 07:00:00 3.0
2021-01-01 08:00:00 3.0
2021-01-01 09:00:00 3.0
2021-01-01 10:00:00 3.0
2021-01-01 11:00:00 3.0
2021-01-01 12:00:00 3.0
2021-01-01 13:00:00 3.0
2021-01-01 14:00:00 3.0
2021-01-01 15:00:00 3.0
2021-01-01 16:00:00 3.0
2021-01-01 17:00:00 3.0
2021-01-01 18:00:00 3.0
2021-01-01 19:00:00 3.0
2021-01-01 20:00:00 1.0
2021-01-01 21:00:00 1.0
2021-01-01 22:00:00 1.0
2021-01-01 23:00:00 1.0
我希望输出数据帧df_ex
看起来像这样:
load_kWh
2021-01-01 00:00:00 1.0
2021-01-01 01:00:00 1.0
2021-01-01 02:00:00 1.0
2021-01-01 03:00:00 1.0
2021-01-01 04:00:00 1.0
2021-01-01 05:00:00 1.0
2021-01-01 06:00:00 1.0
2021-01-01 07:00:00 3.0
2021-01-01 08:00:00 3.0
2021-01-01 09:00:00 3.0
2021-01-01 10:00:00 3.0
2021-01-01 11:00:00 3.0
2021-01-01 12:00:00 3.0
2021-01-01 13:00:00 3.0
2021-01-01 14:00:00 3.0
2021-01-01 15:00:00 3.0
2021-01-01 16:00:00 3.0
2021-01-01 17:00:00 3.0
2021-01-01 18:00:00 3.0
2021-01-01 19:00:00 3.0
2021-01-01 20:00:00 1.0
2021-01-01 21:00:00 1.0
2021-01-01 22:00:00 1.0
2021-01-01 23:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 01:00:00 2.0
2021-01-02 02:00:00 2.0
2021-01-02 03:00:00 2.0
2021-01-02 04:00:00 2.0
2021-01-02 05:00:00 2.0
2021-01-02 06:00:00 2.0
2021-01-02 07:00:00 6.0
2021-01-02 08:00:00 6.0
2021-01-02 09:00:00 6.0
2021-01-02 10:00:00 6.0
2021-01-02 11:00:00 6.0
2021-01-02 12:00:00 6.0
2021-01-02 13:00:00 6.0
2021-01-02 14:00:00 6.0
2021-01-02 15:00:00 6.0
2021-01-02 16:00:00 6.0
2021-01-02 17:00:00 6.0
2021-01-02 18:00:00 6.0
2021-01-02 19:00:00 6.0
2021-01-02 20:00:00 2.0
2021-01-02 21:00:00 2.0
2021-01-02 22:00:00 2.0
2021-01-02 23:00:00 2.0
2021-01-03 00:00:00 4.0
2021-01-03 01:00:00 4.0
2021-01-03 02:00:00 4.0
2021-01-03 03:00:00 4.0
2021-01-03 04:00:00 4.0
2021-01-03 05:00:00 4.0
2021-01-03 06:00:00 4.0
2021-01-03 07:00:00 12.0
2021-01-03 08:00:00 12.0
2021-01-03 09:00:00 12.0
2021-01-03 10:00:00 12.0
2021-01-03 11:00:00 12.0
2021-01-03 12:00:00 12.0
2021-01-03 13:00:00 12.0
2021-01-03 14:00:00 12.0
2021-01-03 15:00:00 12.0
2021-01-03 16:00:00 4.0
2021-01-03 17:00:00 4.0
2021-01-03 18:00:00 4.0
2021-01-03 19:00:00 4.0
2021-01-03 20:00:00 4.0
2021-01-03 21:00:00 4.0
2021-01-03 22:00:00 4.0
2021-01-03 23:00:00 4.0
我尝试了以下解决方案(df
如上所定义):
import pandas as pd
import datetime
start = '2021-01-01 00:00'
end = '2021-01-03 23:00'
freq = 'H'
index = pd.date_range(start,
end,
freq=freq)
df_ex = df.reindex(index)
i = df_ex.index[0].day
f = 2.0
df_ex.loc[df_ex.index.day == i+1] = df_ex.loc[df_ex.index.day == i] * f
print(df_ex)
结果是:
load_kWh
2021-01-01 00:00:00 1.0
2021-01-01 01:00:00 1.0
2021-01-01 02:00:00 1.0
2021-01-01 03:00:00 1.0
2021-01-01 04:00:00 1.0
... ...
2021-01-03 19:00:00 NaN
2021-01-03 20:00:00 NaN
2021-01-03 21:00:00 NaN
2021-01-03 22:00:00 NaN
2021-01-03 23:00:00 NaN
看来我试图在第一天之后用值填充行没有成功。索引是一个DateTimeIndex。
任何关于如何解决这个问题的建议将非常感谢!
要创建数据,您需要一次迭代一天。
假设原始数据至少有一整天的数据,那么你可以这样做:
import pandas as pd
import itertools
import datetime as dt
start = "2021-01-01 00:00"
end = "2021-01-01 23:00"
freq = "H"
df = pd.DataFrame(
{"load_kWh": itertools.chain([1.0] * 7, [3.0] * 13, [1.0] * 4)},
index=pd.date_range(start, end, freq=freq),
)
def add_days_to_df(data: pd.DataFrame, number_of_days: int, k: float) -> pd.DataFrame:
data = data.copy()
for _ in range(number_of_days):
day = data[-24:]
day.index += dt.timedelta(days=1)
day *= k
data = pd.concat((data, day))
return data
print(add_days_to_df(data=df, number_of_days=2, k=2.0))
我设法得到了一个部分解决方案,它可以工作多年而不是几天(从前一年的同一天复制/缩放数据)。这只是一个部分的解决方案,因为闰年还没有被考虑在内。
def add_years_to_df(data: pd.DataFrame, target_year: int, k: float) -> pd.DataFrame:
base_year = data.index[0].year
i = base_year+1
add = data.copy()
for _ in range (base_year+1, target_year+1):
add = k*add
add.index = add.index.map(lambda t: t.replace(year=i))
data = pd.concat((data, add))
i += 1
return data
lp
是我最初问题中所述的输入数据框。target_year
是数据外推的年份。k
是乘数
调用函数,输入例如:add_years_to_df(data=lp, target_year = 2030, k=1.1)
结果是:
Datetime load_kWh
2021-01-01 00:00:00 77.987500
2021-01-01 01:00:00 78.116667
2021-01-01 02:00:00 79.383333
2021-01-01 03:00:00 79.070833
2021-01-01 04:00:00 78.275000
... ...
2030-12-31 19:00:00 247.373361
2030-12-31 20:00:00 74.889393
2030-12-31 21:00:00 71.883018
2030-12-31 22:00:00 73.101291
2030-12-31 23:00:00 72.438118