熊猫数据帧计算每个组的时间差和两个不同组之间的时间差



我创建了一个这样的数据帧:

import pandas as pd
d = {'Time': ['01.07.2019, 06:21:33', '01.07.2019, 06:32:01', '01.07.2019, 06:57:33', '01.07.2019, 07:24:33','01.07.2019, 08:26:25', '01.07.2019, 09:12:44']
,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
,'Name': ['Bayer', 'Bayer', 'ITM', 'ITM', 'Geco' , 'Geco'],
'Group': ['1', '1', '2','2','3','3']}
df = pd.DataFrame(data=d)
output:
Time                    Action  Name    Group
0   01.07.2019, 06:21:33    Opened  Bayer   1
1   01.07.2019, 06:32:01    Closed  Bayer   1
2   01.07.2019, 06:57:33    Opened  ITM     2
3   01.07.2019, 07:24:33    Closed  ITM     2
4   01.07.2019, 08:26:25    Opened  Geco    3
5   01.07.2019, 09:12:44    Closed  Geco    3

所以现在我正在尝试计算每个组的时间差和这些组之间的时间差(以分钟为单位)。例如,拜耳组中的时间差应为 10 分 28 秒,拜耳和 ITM 之间的时差应为 25 分 32 秒。之后,同一组之间的时间差应显示在组开始的同一行的列中,两个不同组之间的时间差应显示在组结束的同一行的另一列中。

所以希望的输出将是:

Time                    Action  Name    Group Time Difference(names) Time Difference(groups)
0   01.07.2019, 06:21:33    Opened  Bayer   1          10:28
1   01.07.2019, 06:32:01    Closed  Bayer   1                                   25:32
2   01.07.2019, 06:57:33    Opened  ITM     2          27:00         
3   01.07.2019, 07:24:33    Closed  ITM     2                                   1:01:52
4   01.07.2019, 08:26:25    Opened  Geco    3          46:19
5   01.07.2019, 09:12:44    Closed  Geco    3

我该怎么做?

首先从字符串创建日期时间,然后是一些 groupbys 和差异:

df["Time"] = pd.to_datetime(df["Time"])
df["d1"] = df.groupby("Name")["Time"].diff().shift(-1).fillna("")
df["d2"] = (
df.groupby((df["Action"] == "Closed").cumsum())["Time"]
.diff()
.shift(-1)
.fillna("")
)

生产

|    | Time                | Action   | Name   |   Group | d1              | d2              |
|---:|:--------------------|:---------|:-------|--------:|:----------------|:----------------|
|  0 | 2019-01-07 06:21:33 | Opened   | Bayer  |       1 | 0 days 00:10:28 |                 |
|  1 | 2019-01-07 06:32:01 | Closed   | Bayer  |       1 |                 | 0 days 00:25:32 |
|  2 | 2019-01-07 06:57:33 | Opened   | ITM    |       2 | 0 days 00:46:19 |                 |
|  3 | 2019-01-07 07:24:33 | Closed   | ITM    |       2 |                 | 0 days 01:01:52 |
|  4 | 2019-01-07 08:26:25 | Opened   | Geco   |       3 | 0 days 00:27:00 |                 |
|  5 | 2019-01-07 09:12:44 | Closed   | Geco   |       3 |                 |                 |

为了解释一下计算d2,对于每个新'Closed'行,此(df['Action'] == 'Closed').cumsum()递增 1。为了清楚起见,在这里我将其与Action一起打印,使用此

df['d2_cond'] = (df['Action'] == 'Closed').cumsum()
df[['Action', 'd2_cond']]

指纹


Action  d2_cond
0   Opened  0
1   Closed  1
2   Opened  1
3   Closed  2
4   Opened  2
5   Closed  3

因此,我们可以在此列表中groupby将每个Closed与相应的下一个Opened放在一起

相关内容

  • 没有找到相关文章

最新更新