条件累积计数熊猫,同时保留第一次更改前的值



我使用Pandas,我试图创建一个列,其中值根据时间列的条件增加,特别是重置输入数据:

Out[73]: 
ID    Time  Job Level   Counter
0      1    17     a
1      1    18     a
2      1    19     a
3      1    20     a
4      1    21     a 
5      1    22     b
6      1    23.    b 
7      1    24.    b 
8      2    10.    a
9      2    11     a
10     2    12     a
11     2    13     a
12     2    14.    b
13     2    15     b 
14     2    16     b
15     2    17     c
16     2    18     c 

我想创建一个新的向量'count',其中每个ID组中的值在第一次更改之前保持不变,并且每次遇到作业级别更改时从零开始,同时保持等于第一次更改之前的时间或没有更改。

我想要什么:

ID    Time  Job Level     Counter
0      1    17     a          17
1      1    18     a          18
2      1    19     a          19
3      1    20     a          20
4      1    21     a          21
5      1    22     b          0
6      1    23     b          1 
7      1    24     b          2 
8      2    10     a          10
9      2    11     a          11
10     2    12     a          12
11     2    13     a          13
12     2    14     b          0 
13     2    15     b          1 
14     2    16     b          2
15     2    17     c          0
16     2    18     c          1

这是我尝试过的

df = df.sort_values(['ID']).reset_index(drop=True)
df['Counter'] = promo_details.groupby('ID')['job_level'].apply(lambda x: x.shift()!=x)

def func(group):
group.loc[group.index[0],'Counter']=group.loc[group.index[0],'time_in_level']
return group
df = df.groupby('emp_id').apply(func)
df['Counter'] = df['Counter'].replace(True,'a')
df['Counter'] = np.where(df.Counter == False,df['Time'],df['Counter']) 
df['Counter'] = df['Counter'].replace('a',0)

这不是在第一次更改之后创建一个累积更改,同时保留它之前的计数,

使用GroupBy.cumcount作为计数器的过滤器第一组-有来自Time列的添加值:

#if need test consecutive duplicates
s = df['Job Level'].ne(df['Job Level'].shift()).cumsum()
m = s.groupby(df['ID']).transform('first').eq(s)
df['Counter'] = np.where(m, df['Time'], df.groupby(['ID', s]).cumcount())
print (df)
ID  Time Job Level  Counter
0    1    17         a       17
1    1    18         a       18
2    1    19         a       19
3    1    20         a       20
4    1    21         a       21
5    1    22         b        0
6    1    23         b        1
7    1    24         b        2
8    2    10         a       10
9    2    11         a       11
10   2    12         a       12
11   2    13         a       13
12   2    14         b        0
13   2    15         b        1
14   2    16         b        2
15   2    17         c        0
16   2    18         c        1

或:

#if each groups are unique
m = df.groupby('ID')['Job Level'].transform('first').eq(df['Job Level'])
df['Counter'] = np.where(m, df['Time'], df.groupby(['ID', 'Job Level']).cumcount())

变化数据的差异:

print (df)
ID  Time Job Level
12   2    14         b
13   2    15         b
14   2    16         b
15   2    17         c
16   2    18         c
10   2    12         a
11   2    18         a
12   2    19         b
13   2    20         b
#if need test consecutive duplicates
s = df['Job Level'].ne(df['Job Level'].shift()).cumsum()
m = s.groupby(df['ID']).transform('first').eq(s)
df['Counter1'] = np.where(m, df['Time'], df.groupby(['ID', s]).cumcount())
m = df.groupby('ID')['Job Level'].transform('first').eq(df['Job Level'])
df['Counter2'] = np.where(m, df['Time'], df.groupby(['ID', 'Job Level']).cumcount())
print (df)
ID  Time Job Level  Counter1  Counter2
12   2    14         b        14        14
13   2    15         b        15        15
14   2    16         b        16        16
15   2    17         c         0         0
16   2    18         c         1         1
10   2    12         a         0         0
11   2    18         a         1         1
12   2    19         b         0        19
13   2    20         b         1        20

相关内容

最新更新