我将我的周日历的详细信息(显然是为了保护无辜而更改了主题)读入熊猫数据帧中。我的目标之一是获得会议的总时间。我想有一个按date_range索引的数据帧,显示一周的每小时频率,显示我在这些时间里参加会议的总分钟数。我的第一个挑战是会议重叠,尽管我很想同时在两个地方,但我肯定不是。不过,我确实从一个跳到另一个。因此,例如,索引8和9处的行的总会议时间应该是90分钟,而不是120分钟,如果我只对列执行df['Duration'].sum()'d,则会出现这种情况。如何使数据帧中的时间段变平,以便只计算一次重叠?使用date_range和句点似乎有一个答案,但我无法理解。下面是我的数据帧df。
Start End Duration Subject
0 07/04/16 10:30:00 07/04/16 11:00:00 30 Inspirational Poster Design Session
1 07/04/16 15:00:00 07/04/16 15:30:00 30 Corporate Speak Do's and Don'ts
2 07/04/16 09:00:00 07/04/16 12:00:00 180 Metrics or Matrix -Panel Discussion
3 07/04/16 13:30:00 07/04/16 15:00:00 90 "Do More with Less" kickoff party
4 07/05/16 09:00:00 07/05/16 10:00:00 60 Fiscal or Physical -Panel Discussion
5 07/05/16 14:00:00 07/05/16 14:30:00 30 "Why we can't have nice thing" training video
6 07/06/16 15:00:00 07/06/16 16:00:00 60 One-on-One with manager -Panel Discussion
7 07/06/16 09:00:00 07/06/16 10:00:00 60 Fireing for Performance leadership session
8 07/06/16 13:00:00 07/06/16 14:00:00 60 Birthday Cake in the conference room *MANDATORY*
9 07/06/16 12:30:00 07/06/16 13:30:00 60 Obligatory lunchtime meeting because it was the only time everyone had avaiable
如有任何帮助,我们将不胜感激。
编辑:这是我希望得到的上述数据集的输出。
2016-07-04 00:00:00 0
2016-07-04 01:00:00 0
2016-07-04 02:00:00 0
2016-07-04 03:00:00 0
2016-07-04 04:00:00 0
2016-07-04 05:00:00 0
2016-07-04 06:00:00 0
2016-07-04 07:00:00 0
2016-07-04 08:00:00 0
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
2016-07-06 16:00:00 0
2016-07-06 17:00:00 0
2016-07-06 18:00:00 0
2016-07-06 19:00:00 0
2016-07-06 20:00:00 0
2016-07-06 21:00:00 0
2016-07-06 22:00:00 0
2016-07-06 23:00:00 0
2016-07-07 00:00:00 0
一种可能性是创建一个按分钟索引的时间序列(下面的s
),跟踪您在该分钟内是否参加会议,然后按小时重新采样。为了匹配您想要的输出,您可以调整s
索引的开始和结束时间。
import io
import pandas as pd
data = io.StringIO('''
Start,End,Duration,Subject
0,07/04/16 10:30:00,07/04/16 11:00:00,30,Inspirational Poster Design Session
1,07/04/16 15:00:00,07/04/16 15:30:00,30,Corporate Speak Do's and Don'ts
2,07/04/16 09:00:00,07/04/16 12:00:00,180,Metrics or Matrix -Panel Discussion
3,07/04/16 13:30:00,07/04/16 15:00:00,90,"Do More with Less" kickoff party
4,07/05/16 09:00:00,07/05/16 10:00:00,60,Fiscal or Physical -Panel Discussion
5,07/05/16 14:00:00,07/05/16 14:30:00,30,"Why we can't have nice thing" training video
6,07/06/16 15:00:00,07/06/16 16:00:00,60,One-on-One with manager -Panel Discussion
7,07/06/16 09:00:00,07/06/16 10:00:00,60,Fireing for Performance leadership session
8,07/06/16 13:00:00,07/06/16 14:00:00,60,Birthday Cake in the conference room *MANDATORY*
9,07/06/16 12:30:00,07/06/16 13:30:00,60,Obligatory lunchtime meeting because it was the only time everyone
''')
df = pd.read_csv(data, usecols=['Start', 'End', 'Subject'])
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
# Ranges in datetime indices include the right endpoint
tdel = pd.Timedelta('1min')
s = pd.Series(False, index=pd.date_range(start=df['Start'].min(),
end=df['End'].max()-tdel,
freq='min'))
for _, meeting in df.iterrows():
s[meeting['Start'] : meeting['End']-tdel] = True
result = s.resample('1H').sum().astype(int)
print(result)
输出:
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
Freq: H, dtype: int64