如何在周期桶中对事件持续时间求和?



示例数据

data = [('2020-06-23 13:43', '2020-06-24 06:43'),
('2020-06-23 18:30', '2020-06-24 11:30'),
('2020-06-23 14:57', '2020-06-24 07:07'),
('2020-06-23 16:37', '2020-06-24 09:03'),
('2020-06-23 21:25', '2020-06-24 12:54')]
df1 = pd.DataFrame(data, columns=['start', 'end'], dtype='datetime64[ns]')
df2 = pd.DataFrame(index=pd.period_range(start='2020-06-23 13:00', end='2020-06-24 12:00', freq='H'))

我想计算并行事件。我想要的输出是这样的百分比:

p0   p1   p2   p3   p4   p5   m0   m1   m2   m3   m4   m5
2020-06-23 13:00 71,7 28,3  0,0  0,0  0,0  0,0   43   17    0    0    0    0
2020-06-23 14:00  0,0 95,0  5,0  0,0  0,0  0,0    0   57    3    0    0    0
...
2020-06-23 21:00  0,0  0,0  0,0  0,0 41,7 58,3    0    0    0    0   25   35
...
2020-06-24 12:00 10,0 90,0  0,0  0,0  0,0  0,0    6   54    0    0    0    0

(m0 到 m5 是以分钟为单位的中间结果,我需要 p0 到 p5 作为百分比(。

我不知道如何开始,但熊猫肯定有一些不错的功能来处理这个问题?

似乎我自己找到了一个可行的解决方案:

import pandas as pd
# interval to analyze
start = pd.Timestamp('2020-06-23 16:00')
end = pd.Timestamp('2020-06-24 10:00')
# events
data = [('2020-06-23 13:43', '2020-06-24 06:43'),
('2020-06-23 18:30', '2020-06-24 11:30'),
('2020-06-23 14:57', '2020-06-24 07:07'),
('2020-06-23 16:37', '2020-06-24 09:03'),
('2020-06-23 21:25', '2020-06-24 12:54')]
df1 = pd.DataFrame(data, columns=['start', 'end'], dtype='datetime64[ns]')
# evaluation
df2 = pd.DataFrame(dict(
time=pd.date_range(start=start, end=end, freq='H'),
change=0))
df2 = df2.append([
pd.DataFrame(dict(time=df1.start, change=1)),
pd.DataFrame(dict(time=df1.end, change=-1))]) 
.sort_values('time')
df2['nevents'] = df2.change.cumsum()
df2['delta'] = df2.time.shift(-1) - df2.time
df2['percent'] = df2.delta.dt.total_seconds() / 3600.0
df2['interval'] = df2.time.dt.floor('H')
df2 = df2[(df2.interval >= start) & (df2.interval < end)]
df2 = df2.groupby(['interval', 'nevents']).percent.sum()
df2 = df2.unstack(fill_value=0)

我很想看到改进。

最新更新