Python Pandas通过重叠的会议来压平日历，以获得会议的实际时间

我将我的周日历的详细信息（显然是为了保护无辜而更改了主题）读入熊猫数据帧中。我的目标之一是获得会议的总时间。我想有一个按date_range索引的数据帧，显示一周的每小时频率，显示我在这些时间里参加会议的总分钟数。我的第一个挑战是会议重叠，尽管我很想同时在两个地方，但我肯定不是。不过，我确实从一个跳到另一个。因此，例如，索引8和9处的行的总会议时间应该是90分钟，而不是120分钟，如果我只对列执行df['Duration'].sum（）'d，则会出现这种情况。如何使数据帧中的时间段变平，以便只计算一次重叠？使用date_range和句点似乎有一个答案，但我无法理解。下面是我的数据帧df。

               Start                End Duration    Subject
0   07/04/16 10:30:00  07/04/16 11:00:00    30      Inspirational Poster Design Session
1   07/04/16 15:00:00  07/04/16 15:30:00    30      Corporate Speak Do's and Don'ts
2   07/04/16 09:00:00  07/04/16 12:00:00    180     Metrics or Matrix -Panel Discussion
3   07/04/16 13:30:00  07/04/16 15:00:00    90      "Do More with Less" kickoff party
4   07/05/16 09:00:00  07/05/16 10:00:00    60      Fiscal or Physical -Panel Discussion
5   07/05/16 14:00:00  07/05/16 14:30:00    30      "Why we can't have nice thing" training video
6   07/06/16 15:00:00  07/06/16 16:00:00    60      One-on-One with manager -Panel Discussion
7   07/06/16 09:00:00  07/06/16 10:00:00    60      Fireing for Performance leadership session
8   07/06/16 13:00:00  07/06/16 14:00:00    60      Birthday Cake in the conference room *MANDATORY*
9   07/06/16 12:30:00  07/06/16 13:30:00    60      Obligatory lunchtime meeting because it was the only time everyone had avaiable

如有任何帮助，我们将不胜感激。

编辑：这是我希望得到的上述数据集的输出。

2016-07-04 00:00:00 0
2016-07-04 01:00:00 0
2016-07-04 02:00:00 0
2016-07-04 03:00:00 0
2016-07-04 04:00:00 0
2016-07-04 05:00:00 0
2016-07-04 06:00:00 0
2016-07-04 07:00:00 0
2016-07-04 08:00:00 0
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
2016-07-06 16:00:00 0
2016-07-06 17:00:00 0
2016-07-06 18:00:00 0
2016-07-06 19:00:00 0
2016-07-06 20:00:00 0
2016-07-06 21:00:00 0
2016-07-06 22:00:00 0
2016-07-06 23:00:00 0
2016-07-07 00:00:00 0

一种可能性是创建一个按分钟索引的时间序列（下面的s），跟踪您在该分钟内是否参加会议，然后按小时重新采样。为了匹配您想要的输出，您可以调整s索引的开始和结束时间。

import io
import pandas as pd
data = io.StringIO('''
Start,End,Duration,Subject
0,07/04/16 10:30:00,07/04/16 11:00:00,30,Inspirational Poster Design Session
1,07/04/16 15:00:00,07/04/16 15:30:00,30,Corporate Speak Do's and Don'ts
2,07/04/16 09:00:00,07/04/16 12:00:00,180,Metrics or Matrix -Panel Discussion
3,07/04/16 13:30:00,07/04/16 15:00:00,90,"Do More with Less" kickoff party
4,07/05/16 09:00:00,07/05/16 10:00:00,60,Fiscal or Physical -Panel Discussion
5,07/05/16 14:00:00,07/05/16 14:30:00,30,"Why we can't have nice thing" training video
6,07/06/16 15:00:00,07/06/16 16:00:00,60,One-on-One with manager -Panel Discussion
7,07/06/16 09:00:00,07/06/16 10:00:00,60,Fireing for Performance leadership session
8,07/06/16 13:00:00,07/06/16 14:00:00,60,Birthday Cake in the conference room *MANDATORY*
9,07/06/16 12:30:00,07/06/16 13:30:00,60,Obligatory lunchtime meeting because it was the only time everyone
''')
df = pd.read_csv(data, usecols=['Start', 'End', 'Subject'])
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
# Ranges in datetime indices include the right endpoint
tdel = pd.Timedelta('1min')
s = pd.Series(False, index=pd.date_range(start=df['Start'].min(),
                                         end=df['End'].max()-tdel,
                                         freq='min'))
for _, meeting in df.iterrows():
    s[meeting['Start'] : meeting['End']-tdel] = True
result = s.resample('1H').sum().astype(int)
print(result)

输出：

2016-07-04 09:00:00    60
2016-07-04 10:00:00    60
2016-07-04 11:00:00    60
2016-07-04 12:00:00     0
2016-07-04 13:00:00    30
2016-07-04 14:00:00    60
2016-07-04 15:00:00    30
2016-07-04 16:00:00     0
2016-07-04 17:00:00     0
2016-07-04 18:00:00     0
2016-07-04 19:00:00     0
2016-07-04 20:00:00     0
2016-07-04 21:00:00     0
2016-07-04 22:00:00     0
2016-07-04 23:00:00     0
2016-07-05 00:00:00     0
2016-07-05 01:00:00     0
2016-07-05 02:00:00     0
2016-07-05 03:00:00     0
2016-07-05 04:00:00     0
2016-07-05 05:00:00     0
2016-07-05 06:00:00     0
2016-07-05 07:00:00     0
2016-07-05 08:00:00     0
2016-07-05 09:00:00    60
2016-07-05 10:00:00     0
2016-07-05 11:00:00     0
2016-07-05 12:00:00     0
2016-07-05 13:00:00     0
2016-07-05 14:00:00    30
2016-07-05 15:00:00     0
2016-07-05 16:00:00     0
2016-07-05 17:00:00     0
2016-07-05 18:00:00     0
2016-07-05 19:00:00     0
2016-07-05 20:00:00     0
2016-07-05 21:00:00     0
2016-07-05 22:00:00     0
2016-07-05 23:00:00     0
2016-07-06 00:00:00     0
2016-07-06 01:00:00     0
2016-07-06 02:00:00     0
2016-07-06 03:00:00     0
2016-07-06 04:00:00     0
2016-07-06 05:00:00     0
2016-07-06 06:00:00     0
2016-07-06 07:00:00     0
2016-07-06 08:00:00     0
2016-07-06 09:00:00    60
2016-07-06 10:00:00     0
2016-07-06 11:00:00     0
2016-07-06 12:00:00    30
2016-07-06 13:00:00    60
2016-07-06 14:00:00     0
2016-07-06 15:00:00    60
Freq: H, dtype: int64

相关内容

最新更新

热门标签：