我正在处理一个熊猫数据帧,如下所示:
Items Run Status
T1 R1 OK
T2 R1 OK
T3 R1 OK
T4 R1 OK
T1 R2 NOK
T2 R2 OK
T3 R2 NOK
T4 R2 OK
T1 R3 OK
T2 R3 NOK
T3 R3 OK
T4 R3 OK
我想捕捉每个项目发生的状态转换(OK到NOK到OK,反之亦然(,以及状态转换发生的次数,如下所示:
Items Run Status Transition Count of Transition per Items
T1 R1 OK 0 1
T1 R2 NOK 1
T1 R3 OK 0
T2 R1 OK 0 1
T2 R2 OK 0
T2 R3 NOK 1
T3 R1 OK 0 2
T3 R2 NOK 1
T3 R3 OK 1
T4 R1 OK 0 0
T4 R2 OK 0
T4 R3 OK 0
IIUC,您可以使用:
# sort values by Items and Run
df = df.sort_values(by=['Items', 'Run'])
# identify changes in Status
m = df['Status'].ne(df.groupby('Items')['Status'].shift().bfill())
# convert boolean to integer
df['Transition'] = m.astype(int)
# count per group masking the non-first values with empty string
df['Count'] = (m.groupby(df['Items']).transform('sum')
.mask(df['Items'].duplicated(), '') # optional
)
输出:
Items Run Status Transition Count
0 T1 R1 OK 0 2
4 T1 R2 NOK 1
8 T1 R3 OK 1
1 T2 R1 OK 0 1
5 T2 R2 OK 0
9 T2 R3 NOK 1
2 T3 R1 OK 0 2
6 T3 R2 NOK 1
10 T3 R3 OK 1
3 T4 R1 OK 0 0
7 T4 R2 OK 0
11 T4 R3 OK 0