延续中断时如何重置顺序



我有一个数据帧如下:

ID          Time
10789890    13:04:10
10778370    13:04:11
10778882    13:04:12
10783746    13:04:14
10783746    13:04:15
10780162    13:04:15
10780418    13:04:15
10777346    13:04:15
10779394    13:04:15
10782210    13:04:15
10781186    13:04:15
10776834    13:04:15
10788866    13:04:15
10788354    13:04:15
10783746    13:04:16
10788866    13:04:16
10781442    13:04:16
10788354    13:04:16
10789890    13:04:16
10782210    13:04:16
10793986    13:04:16
10780162    13:04:16
10778882    13:04:16
10789890    13:04:18
10788354    13:04:18
10783746    13:04:18

我有一个名为"时间">的专栏。我想添加一个名为SEQ的列,其工作原理如下: 如果Time值是连续的,则序列应继续,但如果中断,则重置为 1 并继续。我希望我的输出如下所示:

ID          Time        SEQ
10789890    13:04:10    1
10778370    13:04:11    2
10778882    13:04:12    3
10783746    13:04:14    1
10783746    13:04:15    2
10780162    13:04:15    3
10780418    13:04:15    4
10777346    13:04:15    5
10779394    13:04:15    6
10782210    13:04:15    7
10781186    13:04:15    8
10776834    13:04:15    9
10788866    13:04:15    10
10788354    13:04:15    11
10783746    13:04:16    12
10788866    13:04:16    13
10781442    13:04:16    14
10788354    13:04:16    15
10789890    13:04:16    16
10782210    13:04:16    17
10793986    13:04:16    18
10780162    13:04:16    19
10778882    13:04:16    20
10789890    13:04:18    1
10788354    13:04:18    2
10783746    13:04:18    3

您可以在转换to_timedelta后取行之间的差值,然后用cumsum分组并计算grouped incremental count

s = (pd.to_timedelta(df['Time']).diff().fillna(pd.Timedelta(hours=0))
.dt.total_seconds().gt(1).cumsum())
df['SEQ']= df.groupby(s).cumcount().add(1)

print(df)
ID      Time  SEQ
0   10789890  13:04:10    1
1   10778370  13:04:11    2
2   10778882  13:04:12    3
3   10783746  13:04:14    1
4   10783746  13:04:15    2
5   10780162  13:04:15    3
6   10780418  13:04:15    4
7   10777346  13:04:15    5
8   10779394  13:04:15    6
9   10782210  13:04:15    7
10  10781186  13:04:15    8
11  10776834  13:04:15    9
12  10788866  13:04:15   10
13  10788354  13:04:15   11
14  10783746  13:04:16   12
15  10788866  13:04:16   13
16  10781442  13:04:16   14
17  10788354  13:04:16   15
18  10789890  13:04:16   16
19  10782210  13:04:16   17
20  10793986  13:04:16   18
21  10780162  13:04:16   19
22  10778882  13:04:16   20
23  10789890  13:04:18    1
24  10788354  13:04:18    2
25  10783746  13:04:18    3

注意:如果Time列已经是时间增量,请跳过pd.to_timedelta部分,只使用df['Time'].diff().fillna(..而不是pd.to_timedelta(df['Time']).diff().fillna(...

最新更新