我有一个数据帧:
import pandas as pd
data = pd.DataFrame({'start':['2020-08-01','2020-08-02','2020-08-03','2020-08-04','2020-08-05','2020-08-06','2020-08-07','2020-08-08'],
'end':['2020-08-03','2020-08-03','2020-08-06','2020-08-06','2020-08-06','2020-08-08','2020-08-08','2020-08-08'],
'score':[74, 81, 38, 49, 79, 17, 53, 69]})
我需要将start
日期与其对应的end
日期之间的score
差计算为:
start end score result
0 2020-08-01 2020-08-03 74 36 # 74-38 as score on 08/03 is 38
1 2020-08-02 2020-08-03 81 43 # 81-38
2 2020-08-03 2020-08-06 38 21 # 38-17 as score on 08/06 is 17
3 2020-08-04 2020-08-06 49 32 # 49-17
4 2020-08-05 2020-08-06 79 62 # 79-17
5 2020-08-06 2020-08-08 17 -52 # 17-69 as score on 08/08 is 69
6 2020-08-07 2020-08-08 53 -16 # 53-69
7 2020-08-08 2020-08-08 69 0 # 69-69
有没有一种好的pandas
方法可以做到这一点?非常感谢!
如果所有start
值都是唯一的,则使用映射值相减:
data['result'] = data['score'].sub(data['end'].map(data.set_index('start')['score']))
print (data)
start end score result
0 2020-08-01 2020-08-03 74 36
1 2020-08-02 2020-08-03 81 43
2 2020-08-03 2020-08-06 38 21
3 2020-08-04 2020-08-06 49 32
4 2020-08-05 2020-08-06 79 62
5 2020-08-06 2020-08-08 17 -52
6 2020-08-07 2020-08-08 53 -16
7 2020-08-08 2020-08-08 69 0
详细信息:
print (data['end'].map(data.set_index('start')['score']))
0 38
1 38
2 17
3 17
4 17
5 69
6 69
7 69
Name: end, dtype: int64