python-将df转换为时间序列



我有一个描述事务的df,比如

transaction   start_in_s_since_epoch    duration_in_s    charged_energy_in_wh
1             1.457423e+09              1821.0           1732
2             1.457389e+09              35577.0          18397
3             1.457425e+09              2.0              0
[...]

我假设charged_energy在整个交易过程中是线性的。我想把它转换成一个具有一天粒度的时间序列。一天内的charged_energy应该和持续时间相加。

day                sum_duration_in_s   sum_charged_energy_in_wh
2016-03-16 00:00   123                 456
2016-03-17 00:00   456                 789
2016-03-18 00:00   789                 012
[...]

知道吗?我每天都在与边界作斗争。与的交易

transaction   start_in_s_since_epoch    duration_in_s    charged_energy_in_wh
500             1620777300              600              1000

应平均分配给

day                sum_duration_in_s   sum_charged_energy_in_wh
2021-05-11 00:00   300                 500
2021-05-11 00:00   300                 500

这为我做到了。速度慢但有效:

from datetime import datetime
from datetime_truncate import truncate
df_tmp = pd.DataFrame()
for index, row in df.iterrows():
day_in_s = 60*60*24
start = row.start_in_s_since_epoch
time = row.duration_in_s
energy_per_s = row.charged_energy_in_wh / row.duration_in_s
till_midnight_in_s = truncate(pd.to_datetime(start + day_in_s, unit='s'), 'day').timestamp() - start
rest_in_s = time - till_midnight_in_s

data = {'day':truncate(pd.to_datetime(start, unit='s'), 'day'),
'sum_duration_in_s':min(time, till_midnight_in_s),
'sum_charged_energy_in_wh':min(time, till_midnight_in_s) * energy_per_s}
df_tmp = df_tmp.append(data, ignore_index=True) 

while rest_in_s > 0:
start += day_in_s
data = {'day':truncate(pd.to_datetime(start, unit='s'), 'day'),
'sum_duration_in_s':min(rest_in_s, day_in_s),
'sum_charged_energy_in_wh':min(rest_in_s, day_in_s) * energy_per_s}
df_tmp = df_tmp.append(data, ignore_index=True)  
rest_in_s = rest_in_s - day_in_s

df_ts = df_tmp.groupby(['date']).agg({'sum_charged_energy_in_wh':sum,
'sum_duration_in_s':sum}).sort_values('date')
df_ts = df_ts.asfreq('D', fill_value=0)

最新更新