我有一个熊猫数据帧values
,看起来像:
person | date | value
-------|------------|------
A | 01-01-2020 | 1
A | 01-08-2020 | 2
A | 01-12-2020 | 3
B | 01-02-2020 | 4
B | 01-05-2020 | 5
B | 01-06-2020 | 6
另一个数据帧encounters
如下所示:
person | date
-------|------------
A | 01-01-2020
A | 01-03-2020
A | 01-06-2020
A | 01-11-2020
A | 01-12-2020
A | 01-15-2020
B | 01-01-2020
B | 01-04-2020
B | 01-06-2020
B | 01-08-2020
B | 01-09-2020
B | 01-10-2020
我最终想要的是一个合并的数据帧,它将第三列添加到encounters
数据集中,其中相应人员的最新值为value
(如下所示(。有没有一种简单的方法可以在熊猫中做到这一点?
person | date | most_recent_value
-------|------------|-------------------
A | 01-01-2020 | 1
A | 01-03-2020 | 1
A | 01-06-2020 | 1
A | 01-11-2020 | 2
A | 01-12-2020 | 3
A | 01-15-2020 | 3
B | 01-01-2020 | None
B | 01-04-2020 | 4
B | 01-06-2020 | 6
B | 01-08-2020 | 6
B | 01-09-2020 | 6
B | 01-10-2020 | 6
这本质上是merge_asof
:
values['date'] = pd.to_datetime(values['date'])
encounters['date'] = pd.to_datetime(encounters['date'])
(pd.merge_asof(encounters.assign(rank=np.arange(encounters.shape[0]))
.sort_values('date'),
values.sort_values('date'),
by='person', on='date')
.sort_values('rank')
.drop('rank', axis=1)
)
输出:
person date value
0 A 2020-01-01 1.0
2 A 2020-01-03 1.0
4 A 2020-01-06 1.0
9 A 2020-01-11 2.0
10 A 2020-01-12 3.0
11 A 2020-01-15 3.0
1 B 2020-01-01 NaN
3 B 2020-01-04 4.0
5 B 2020-01-06 6.0
6 B 2020-01-08 6.0
7 B 2020-01-09 6.0
8 B 2020-01-10 6.0