递增计数在分组中第一次到达一个数字



我发现这篇文章非常有用,我正试图在一个群组中做同样的事情。

这是原始帖子,每次df['step']有6时,都有一个增量计数器:link

在我的例子中,我想在每次1出现时增加这个计数器

所以我修改了这个请求:

df['counter'] = ((df['step']==6) & (df.shift(1)['step']!=6 )).cumsum()

像这样:

df['counter_2'] = ((df['counter1'] == 1) & (df.shift(1)['counter1'] != 1)).cumsum()

现在我试着用分组by ('prd_id')

更新后的答案

df['counter'] = df['step'].eq(1).groupby(df['prd_id']).cumsum()

输出:

prd_id  step  counter
0       A     1        1
1       A     2        1
2       A     3        1
3       A     4        1
4       A     1        2
5       A     2        2
6       B     1        1
7       B     1        2
8       B     2        2
9       B     1        3
10      B     2        3
11      B     3        3

原始回答

您可以使用duplicated,布尔NOT (~)和cumsum:

df['counter'] = (~df['step'].duplicated()).cumsum()

输出:

step  counter
0      2        1
1      2        1
2      2        1
3      3        2
4      4        3
5      4        3
6      5        4
7      6        5
8      6        5
9      6        5
10     6        5
11     7        6
12     5        6  # not incrementing, 5 was seen above
13     6        6  # not incrementing, 6 was seen above
14     6        6
15     6        6
16     7        6  # not incrementing, 7 was seen above
17     5        6  # not incrementing, 5 was seen above
18     6        6  # not incrementing, 6 was seen above
19     7        6  # not incrementing, 7 was seen above
20     5        6  # not incrementing, 5 was seen above

如果您也有组,使用:

df['counter'] = (~df[['step', 'group']].duplicated()).groupby(df['group']).cumsum()

的例子:

group  step  counter
0      A     1        1
1      A     2        2
2      A     2        2
3      A     3        3
4      A     2        3
5      A     4        4
6      B     1        1  # first time in B
7      B     1        1
8      B     2        2
9      B     1        2  # duplicated in B
10     B     2        2
11     B     3        3

最新更新