计算熊猫在时间范围内阻塞的时间

我有一个生产或完成的产品列表，如下所示：

时间戳停止

名称		时间戳开始
产品1
产品1	2021-01-01 15:30:00	2021-01-0115:55:00
产品1	2021-01-02 15:05:00	2021-01-02中15:22:00
产品1	2021-01-03 15:45:00	2021:01-03 15:55:00

输入数据：

>>> df
Name               Start                Stop
0  Product 1 2021-01-01 14:49:00 2021-01-01 15:04:00  # OK (overlap 4')
1  Product 1 2021-01-01 15:15:00 2021-01-01 15:37:00  # OK
2  Product 1 2021-01-01 15:30:00 2021-01-01 15:55:00  # OK
3  Product 1 2021-01-02 15:05:00 2021-01-02 15:22:00  # OK
4  Product 1 2021-01-03 15:45:00 2021-01-03 15:55:00  # OK
5  Product 1 2021-01-03 15:51:00 2021-01-03 16:23:00  # OK (overlap 9')
6  Product 1 2021-01-04 14:28:00 2021-01-04 17:12:00  # OK (overlap 60')
7  Product 1 2021-01-05 11:46:00 2021-01-05 13:40:00  # Out of bounds
8  Product 1 2021-01-05 17:20:00 2021-01-05 19:11:00  # Out of bounds

首先，删除越界数据(7和8(：

import datetime
START = datetime.time(15)
STOP = datetime.time(16)
df1 = df.loc[(df["Start"].dt.floor(freq="H").dt.time <= START)
& (START <= df["Stop"].dt.floor(freq="H").dt.time),
["Start", "Stop"]]

提取Start和Stop日期时间的分钟。如果过程在15:00之前开始，请设置为0，因为我们只想保留重叠部分。如果该过程在16:00之后结束，请将分钟设置为59。

import numpy as np
df1["m1"] = np.where(df1["Start"].dt.time > START, 
df1["Start"].sub(df1["Start"].dt.floor(freq="H"))
.dt.seconds // 60, 0)
df1["m2"] = np.where(df1["Stop"].dt.time < STOP,
df1["Stop"].sub(df1["Stop"].dt.floor(freq="H"))
.dt.seconds // 60, 59)

>>> df1
Start                Stop  m1  m2
0 2021-01-01 14:49:00 2021-01-01 15:04:00   0   4
1 2021-01-01 15:15:00 2021-01-01 15:37:00  15  37
2 2021-01-01 15:30:00 2021-01-01 15:55:00  30  55
3 2021-01-02 15:05:00 2021-01-02 15:22:00   5  22
4 2021-01-03 15:45:00 2021-01-03 15:55:00  45  55
5 2021-01-03 15:51:00 2021-01-03 16:23:00  51  59
6 2021-01-04 14:28:00 2021-01-04 17:12:00   0  59

创建一个空表len(df1)x60'来存储进程使用情况：

out = pd.DataFrame(0, index=df1.index, columns=pd.RangeIndex(60))

填充out数据帧：

for idx, (i1, i2) in df1[["m1", "m2"]].iterrows():
out.loc[idx, i1:i2] = 1

>>> out
0   1   2   3   4   5   6   ...  53  54  55  56  57  58  59
0   1   1   1   1   1   0   0  ...   0   0   0   0   0   0   0  # 4'
1   0   0   0   0   0   0   0  ...   0   0   0   0   0   0   0
2   0   0   0   0   0   0   0  ...   1   1   1   0   0   0   0
3   0   0   0   0   0   1   1  ...   0   0   0   0   0   0   0
4   0   0   0   0   0   0   0  ...   1   1   1   0   0   0   0
5   0   0   0   0   0   0   0  ...   1   1   1   1   1   1   1
6   1   1   1   1   1   1   1  ...   1   1   1   1   1   1   1  # full hour
[7 rows x 60 columns]

最后，计算空闲分钟数：

>>> 60 - (out.groupby(df1["Start"].dt.date).sum() & 1).sum(axis="columns")
Start
2021-01-01    22
2021-01-02    42
2021-01-03    50
2021-01-04     0
dtype: int64

注意：您必须确定Stop日期时间是否已关闭。

相关内容

最新更新

热门标签：