熊猫:在某种条件下改变群体的值



My df:

import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[1,1,1,2,2],
'time':['2020-01-01 12:00:15','2020-01-01 12:00:30','2020-01-01 12:00:45','2020-01-03 08:00:00','2020-01-03 08:00:15'],
'time1':['2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-01 12:00:00'],
'numb':[1,5,8,0,4]})
df['time'] = pd.to_datetime(df['time'])
df['time1'] = pd.to_datetime(df['time1'])
df['numb_diff'] = df['numb'] - df['numb'].shift()

输出:

id                 time               time1 numb    numb_diff
0    1  2020-01-01 12:00:15 2020-01-01 12:00:00    1          NaN
1    1  2020-01-01 12:00:30 2020-01-01 12:00:00    5          4.0
2    1  2020-01-01 12:00:45 2020-01-01 12:00:00    8          3.0
3    2  2020-01-03 08:00:00 2020-01-01 12:00:00    0         -8.0
4    2  2020-01-03 08:00:15 2020-01-01 12:00:00    4          4.0

现在我想将time1设置为组(id)的最小time值,只要这个id在位置numb_diff的第一个条目是<0。

预期输出:

id                 time               time1 numb    numb_diff
0    1  2020-01-01 12:00:15 2020-01-01 12:00:00    1          NaN
1    1  2020-01-01 12:00:30 2020-01-01 12:00:00    5          4.0
2    1  2020-01-01 12:00:45 2020-01-01 12:00:00    8          3.0
3    2  2020-01-03 08:00:00 2020-01-03 08:00:00    0         -8.0    #Changing time1 to the min of time the group(id = 2)
4    2  2020-01-03 08:00:15 2020-01-03 08:00:00    4          4.0

让我们创建一个布尔掩码,表示num_diff小于零的条件,然后在id上按这个掩码分组并使用first进行转换,最后使用此掩码的布尔索引来替换time1中的值:

m = df['numb_diff'].lt(0).groupby(df['id']).transform('first')
df.loc[m, 'time1'] = df.groupby('id')['time'].transform('min')
<小时 />
id                time               time1  numb  numb_diff
0   1 2020-01-01 12:00:15 2020-01-01 12:00:00     1        NaN
1   1 2020-01-01 12:00:30 2020-01-01 12:00:00     5        4.0
2   1 2020-01-01 12:00:45 2020-01-01 12:00:00     8        3.0
3   2 2020-01-03 08:00:00 2020-01-03 08:00:00     0       -8.0
4   2 2020-01-03 08:00:15 2020-01-03 08:00:00     4        4.0

最新更新