我有一个生产或完成的产品列表,如下所示:
名称 | 时间戳开始 | 时间戳停止|
---|---|---|
产品1 | ||
产品1 | 2021-01-01 15:30:00 | 2021-01-0115:55:00 |
产品1 | 2021-01-02 15:05:00 | 2021-01-02中15:22:00 |
产品1 | 2021-01-03 15:45:00 | 2021:01-03 15:55:00 |
输入数据:
>>> df
Name Start Stop
0 Product 1 2021-01-01 14:49:00 2021-01-01 15:04:00 # OK (overlap 4')
1 Product 1 2021-01-01 15:15:00 2021-01-01 15:37:00 # OK
2 Product 1 2021-01-01 15:30:00 2021-01-01 15:55:00 # OK
3 Product 1 2021-01-02 15:05:00 2021-01-02 15:22:00 # OK
4 Product 1 2021-01-03 15:45:00 2021-01-03 15:55:00 # OK
5 Product 1 2021-01-03 15:51:00 2021-01-03 16:23:00 # OK (overlap 9')
6 Product 1 2021-01-04 14:28:00 2021-01-04 17:12:00 # OK (overlap 60')
7 Product 1 2021-01-05 11:46:00 2021-01-05 13:40:00 # Out of bounds
8 Product 1 2021-01-05 17:20:00 2021-01-05 19:11:00 # Out of bounds
首先,删除越界数据(7和8(:
import datetime
START = datetime.time(15)
STOP = datetime.time(16)
df1 = df.loc[(df["Start"].dt.floor(freq="H").dt.time <= START)
& (START <= df["Stop"].dt.floor(freq="H").dt.time),
["Start", "Stop"]]
提取Start
和Stop
日期时间的分钟。如果过程在15:00之前开始,请设置为0,因为我们只想保留重叠部分。如果该过程在16:00之后结束,请将分钟设置为59。
import numpy as np
df1["m1"] = np.where(df1["Start"].dt.time > START,
df1["Start"].sub(df1["Start"].dt.floor(freq="H"))
.dt.seconds // 60, 0)
df1["m2"] = np.where(df1["Stop"].dt.time < STOP,
df1["Stop"].sub(df1["Stop"].dt.floor(freq="H"))
.dt.seconds // 60, 59)
>>> df1
Start Stop m1 m2
0 2021-01-01 14:49:00 2021-01-01 15:04:00 0 4
1 2021-01-01 15:15:00 2021-01-01 15:37:00 15 37
2 2021-01-01 15:30:00 2021-01-01 15:55:00 30 55
3 2021-01-02 15:05:00 2021-01-02 15:22:00 5 22
4 2021-01-03 15:45:00 2021-01-03 15:55:00 45 55
5 2021-01-03 15:51:00 2021-01-03 16:23:00 51 59
6 2021-01-04 14:28:00 2021-01-04 17:12:00 0 59
创建一个空表len(df1)x60'
来存储进程使用情况:
out = pd.DataFrame(0, index=df1.index, columns=pd.RangeIndex(60))
填充out
数据帧:
for idx, (i1, i2) in df1[["m1", "m2"]].iterrows():
out.loc[idx, i1:i2] = 1
>>> out
0 1 2 3 4 5 6 ... 53 54 55 56 57 58 59
0 1 1 1 1 1 0 0 ... 0 0 0 0 0 0 0 # 4'
1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 ... 1 1 1 0 0 0 0
3 0 0 0 0 0 1 1 ... 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 ... 1 1 1 0 0 0 0
5 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 # full hour
[7 rows x 60 columns]
最后,计算空闲分钟数:
>>> 60 - (out.groupby(df1["Start"].dt.date).sum() & 1).sum(axis="columns")
Start
2021-01-01 22
2021-01-02 42
2021-01-03 50
2021-01-04 0
dtype: int64
注意:您必须确定Stop
日期时间是否已关闭。