我创建了一个这样的数据帧:
import pandas as pd
d = {'Time': ['01.07.2019, 06:21:33', '01.07.2019, 06:32:01', '01.07.2019, 06:57:33', '01.07.2019, 07:24:33','01.07.2019, 08:26:25', '01.07.2019, 09:12:44']
,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
,'Name': ['Bayer', 'Bayer', 'ITM', 'ITM', 'Geco' , 'Geco'],
'Group': ['1', '1', '2','2','3','3']}
df = pd.DataFrame(data=d)
output:
Time Action Name Group
0 01.07.2019, 06:21:33 Opened Bayer 1
1 01.07.2019, 06:32:01 Closed Bayer 1
2 01.07.2019, 06:57:33 Opened ITM 2
3 01.07.2019, 07:24:33 Closed ITM 2
4 01.07.2019, 08:26:25 Opened Geco 3
5 01.07.2019, 09:12:44 Closed Geco 3
所以现在我正在尝试计算每个组的时间差和这些组之间的时间差(以分钟为单位)。例如,拜耳组中的时间差应为 10 分 28 秒,拜耳和 ITM 之间的时差应为 25 分 32 秒。之后,同一组之间的时间差应显示在组开始的同一行的列中,两个不同组之间的时间差应显示在组结束的同一行的另一列中。
所以希望的输出将是:
Time Action Name Group Time Difference(names) Time Difference(groups)
0 01.07.2019, 06:21:33 Opened Bayer 1 10:28
1 01.07.2019, 06:32:01 Closed Bayer 1 25:32
2 01.07.2019, 06:57:33 Opened ITM 2 27:00
3 01.07.2019, 07:24:33 Closed ITM 2 1:01:52
4 01.07.2019, 08:26:25 Opened Geco 3 46:19
5 01.07.2019, 09:12:44 Closed Geco 3
我该怎么做?
首先从字符串创建日期时间,然后是一些 groupbys 和差异:
df["Time"] = pd.to_datetime(df["Time"])
df["d1"] = df.groupby("Name")["Time"].diff().shift(-1).fillna("")
df["d2"] = (
df.groupby((df["Action"] == "Closed").cumsum())["Time"]
.diff()
.shift(-1)
.fillna("")
)
生产
| | Time | Action | Name | Group | d1 | d2 |
|---:|:--------------------|:---------|:-------|--------:|:----------------|:----------------|
| 0 | 2019-01-07 06:21:33 | Opened | Bayer | 1 | 0 days 00:10:28 | |
| 1 | 2019-01-07 06:32:01 | Closed | Bayer | 1 | | 0 days 00:25:32 |
| 2 | 2019-01-07 06:57:33 | Opened | ITM | 2 | 0 days 00:46:19 | |
| 3 | 2019-01-07 07:24:33 | Closed | ITM | 2 | | 0 days 01:01:52 |
| 4 | 2019-01-07 08:26:25 | Opened | Geco | 3 | 0 days 00:27:00 | |
| 5 | 2019-01-07 09:12:44 | Closed | Geco | 3 | | |
为了解释一下计算d2
,对于每个新'Closed'
行,此(df['Action'] == 'Closed').cumsum()
递增 1。为了清楚起见,在这里我将其与Action
一起打印,使用此
df['d2_cond'] = (df['Action'] == 'Closed').cumsum()
df[['Action', 'd2_cond']]
指纹
Action d2_cond
0 Opened 0
1 Closed 1
2 Opened 1
3 Closed 2
4 Opened 2
5 Closed 3
因此,我们可以在此列表中groupby
将每个Closed
与相应的下一个Opened
放在一起