My df:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[1,1,1,2,2],
'time':['2020-01-01 12:00:15','2020-01-01 12:00:30','2020-01-01 12:00:45','2020-01-03 08:00:00','2020-01-03 08:00:15'],
'time1':['2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-01 12:00:00'],
'numb':[1,5,8,0,4]})
df['time'] = pd.to_datetime(df['time'])
df['time1'] = pd.to_datetime(df['time1'])
df['numb_diff'] = df['numb'] - df['numb'].shift()
输出:
id time time1 numb numb_diff
0 1 2020-01-01 12:00:15 2020-01-01 12:00:00 1 NaN
1 1 2020-01-01 12:00:30 2020-01-01 12:00:00 5 4.0
2 1 2020-01-01 12:00:45 2020-01-01 12:00:00 8 3.0
3 2 2020-01-03 08:00:00 2020-01-01 12:00:00 0 -8.0
4 2 2020-01-03 08:00:15 2020-01-01 12:00:00 4 4.0
现在我想将time1
设置为组(id
)的最小time
值,只要这个id
在位置numb_diff
的第一个条目是<0。
预期输出:
id time time1 numb numb_diff
0 1 2020-01-01 12:00:15 2020-01-01 12:00:00 1 NaN
1 1 2020-01-01 12:00:30 2020-01-01 12:00:00 5 4.0
2 1 2020-01-01 12:00:45 2020-01-01 12:00:00 8 3.0
3 2 2020-01-03 08:00:00 2020-01-03 08:00:00 0 -8.0 #Changing time1 to the min of time the group(id = 2)
4 2 2020-01-03 08:00:15 2020-01-03 08:00:00 4 4.0
让我们创建一个布尔掩码,表示num_diff
小于零的条件,然后在id
上按这个掩码分组并使用first
进行转换,最后使用此掩码的布尔索引来替换time1
中的值:
m = df['numb_diff'].lt(0).groupby(df['id']).transform('first')
df.loc[m, 'time1'] = df.groupby('id')['time'].transform('min')
<小时 />id time time1 numb numb_diff
0 1 2020-01-01 12:00:15 2020-01-01 12:00:00 1 NaN
1 1 2020-01-01 12:00:30 2020-01-01 12:00:00 5 4.0
2 1 2020-01-01 12:00:45 2020-01-01 12:00:00 8 3.0
3 2 2020-01-03 08:00:00 2020-01-03 08:00:00 0 -8.0
4 2 2020-01-03 08:00:15 2020-01-03 08:00:00 4 4.0