示例数据
data = [('2020-06-23 13:43', '2020-06-24 06:43'),
('2020-06-23 18:30', '2020-06-24 11:30'),
('2020-06-23 14:57', '2020-06-24 07:07'),
('2020-06-23 16:37', '2020-06-24 09:03'),
('2020-06-23 21:25', '2020-06-24 12:54')]
df1 = pd.DataFrame(data, columns=['start', 'end'], dtype='datetime64[ns]')
df2 = pd.DataFrame(index=pd.period_range(start='2020-06-23 13:00', end='2020-06-24 12:00', freq='H'))
我想计算并行事件。我想要的输出是这样的百分比:
p0 p1 p2 p3 p4 p5 m0 m1 m2 m3 m4 m5
2020-06-23 13:00 71,7 28,3 0,0 0,0 0,0 0,0 43 17 0 0 0 0
2020-06-23 14:00 0,0 95,0 5,0 0,0 0,0 0,0 0 57 3 0 0 0
...
2020-06-23 21:00 0,0 0,0 0,0 0,0 41,7 58,3 0 0 0 0 25 35
...
2020-06-24 12:00 10,0 90,0 0,0 0,0 0,0 0,0 6 54 0 0 0 0
(m0 到 m5 是以分钟为单位的中间结果,我需要 p0 到 p5 作为百分比(。
我不知道如何开始,但熊猫肯定有一些不错的功能来处理这个问题?
似乎我自己找到了一个可行的解决方案:
import pandas as pd
# interval to analyze
start = pd.Timestamp('2020-06-23 16:00')
end = pd.Timestamp('2020-06-24 10:00')
# events
data = [('2020-06-23 13:43', '2020-06-24 06:43'),
('2020-06-23 18:30', '2020-06-24 11:30'),
('2020-06-23 14:57', '2020-06-24 07:07'),
('2020-06-23 16:37', '2020-06-24 09:03'),
('2020-06-23 21:25', '2020-06-24 12:54')]
df1 = pd.DataFrame(data, columns=['start', 'end'], dtype='datetime64[ns]')
# evaluation
df2 = pd.DataFrame(dict(
time=pd.date_range(start=start, end=end, freq='H'),
change=0))
df2 = df2.append([
pd.DataFrame(dict(time=df1.start, change=1)),
pd.DataFrame(dict(time=df1.end, change=-1))])
.sort_values('time')
df2['nevents'] = df2.change.cumsum()
df2['delta'] = df2.time.shift(-1) - df2.time
df2['percent'] = df2.delta.dt.total_seconds() / 3600.0
df2['interval'] = df2.time.dt.floor('H')
df2 = df2[(df2.interval >= start) & (df2.interval < end)]
df2 = df2.groupby(['interval', 'nevents']).percent.sum()
df2 = df2.unstack(fill_value=0)
我很想看到改进。