根据另一列中的一列查找行值并进行计算



我有一个数据帧:

import pandas as pd
data = pd.DataFrame({'start':['2020-08-01','2020-08-02','2020-08-03','2020-08-04','2020-08-05','2020-08-06','2020-08-07','2020-08-08'],
'end':['2020-08-03','2020-08-03','2020-08-06','2020-08-06','2020-08-06','2020-08-08','2020-08-08','2020-08-08'],
'score':[74, 81, 38, 49, 79, 17, 53, 69]})

我需要将start日期与其对应的end日期之间的score差计算为:

start         end  score  result
0   2020-08-01  2020-08-03     74      36  # 74-38 as score on 08/03 is 38
1   2020-08-02  2020-08-03     81      43  # 81-38
2   2020-08-03  2020-08-06     38      21  # 38-17 as score on 08/06 is 17
3   2020-08-04  2020-08-06     49      32  # 49-17
4   2020-08-05  2020-08-06     79      62  # 79-17
5   2020-08-06  2020-08-08     17     -52  # 17-69 as score on 08/08 is 69
6   2020-08-07  2020-08-08     53     -16  # 53-69
7   2020-08-08  2020-08-08     69       0  # 69-69

有没有一种好的pandas方法可以做到这一点?非常感谢!

如果所有start值都是唯一的,则使用映射值相减:

data['result'] = data['score'].sub(data['end'].map(data.set_index('start')['score']))
print (data)
start         end  score  result
0  2020-08-01  2020-08-03     74      36
1  2020-08-02  2020-08-03     81      43
2  2020-08-03  2020-08-06     38      21
3  2020-08-04  2020-08-06     49      32
4  2020-08-05  2020-08-06     79      62
5  2020-08-06  2020-08-08     17     -52
6  2020-08-07  2020-08-08     53     -16
7  2020-08-08  2020-08-08     69       0

详细信息

print (data['end'].map(data.set_index('start')['score']))
0    38
1    38
2    17
3    17
4    17
5    69
6    69
7    69
Name: end, dtype: int64

最新更新