我有两个数据框coarse
和fine
。fine
包含3列start_time, end_time, start_price, end_price
。coarse
包含start_time, end_time
。注意,所有的时间都是熊猫时间戳对象,例如,2016-12-12 01:03:13.15231+00:00
对于coarse
,我需要添加2个额外的列start_price, end_price
,其中coarse.start_price
是fine.start_price
,fine.start_time
最接近coarse.start_time
,类似地对于end_price
也是如此。
下面是一个清晰的例子:
coarse:
start_time end_time
2016-12-12 01:00:00.000+00:00 2016-12-12 02:00:00.000+00:00
2016-12-12 02:00:00.000+00:00 2016-12-12 03:00:00.000+00:00
2016-12-12 03:00:00.000+00:00 2016-12-12 03:30:00.000+00:00
fine:
start_time end_time start_price
2016-12-12 00:59:00.000+00:00 2016-12-12 01:12:00.000+00:00 2.3
2016-12-12 01:12:00.000+00:00 2016-12-12 01:15:00.000+00:00 4.5
2016-12-12 01:15:00.000+00:00 2016-12-12 01:45:00.000+00:00 5.7
2016-12-12 01:45:00.000+00:00 2016-12-12 01:55:00.000+00:00 8.8
2016-12-12 01:55:00.000+00:00 2016-12-12 02:15:00.000+00:00 9.9
2016-12-12 02:15:00.000+00:00 2016-12-12 02:16:00.000+00:00 11.2
2016-12-12 02:16:00.000+00:00 2016-12-12 02:31:00.000+00:00 13.5
2016-12-12 02:31:00.000+00:00 2016-12-12 02:45:00.000+00:00 14.8
2016-12-12 02:45:00.000+00:00 2016-12-12 02:59:00.000+00:00 15.9
2016-12-12 02:59:00.000+00:00 2016-12-12 03:31:00.000+00:00 16.0
所以结果应该是
coarse:
start_time end_time start_price
2016-12-12 01:00:00.000+00:00 2016-12-12 02:00:00.000+00:00 2.3
2016-12-12 02:00:00.000+00:00 2016-12-12 03:00:00.000+00:00 9.9
2016-12-12 03:00:00.000+00:00 2016-12-12 03:30:00.000+00:00 16.0
(我跳过了end_price
,因为它是相同的逻辑)。我想知道最简单的方法是什么?
此外,与其使用最近的邻居,我可能会考虑根据时间插入价格。有没有简单的方法可以做到这一点呢?
您可能正在寻找pd。Merge_asof对于在不完全匹配的时间进行连接是完美的。有一个方向参数,你可以调整,使匹配到你想要的方向。
import pandas as pd
coarse = pd.DataFrame({'start_time': ['2016-12-12 01:00:00.000+00:00', '2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00'], 'end_time': ['2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00', '2016-12-12 03:30:00.000+00:00']} )
fine = pd.DataFrame({'start_time': ['2016-12-12 00:59:00.000+00:00', '2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00'], 'end_time': ['2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00', '2016-12-12 03:31:00.000+00:00'], 'start_price': [2.3, 4.5, 5.7, 8.8, 9.9, 11.2, 13.5, 14.8, 15.9, 16.0]})
coarse['start_time'] = pd.to_datetime(coarse['start_time'])
fine['start_time'] = pd.to_datetime(fine['start_time'])
coarse['end_time'] = pd.to_datetime(coarse['end_time'])
fine['end_time'] = pd.to_datetime(fine['end_time'])
coarse = pd.merge_asof(coarse, fine[['start_time','start_price']],
on='start_time',
direction='nearest')
coarse = pd.merge_asof(coarse,
fine[['end_time','start_price']].rename(columns={'start_price':'end_price'}),
on='end_time',
direction='nearest')
输出start_time end_time start_price end_price
0 2016-12-12 01:00:00+00:00 2016-12-12 02:00:00+00:00 2.3 8.8
1 2016-12-12 02:00:00+00:00 2016-12-12 03:00:00+00:00 9.9 15.9
2 2016-12-12 03:00:00+00:00 2016-12-12 03:30:00+00:00 16.0 16.0