根据另一个数据框中最近的值计算数据框中的列



我有两个数据框coarsefinefine包含3列start_time, end_time, start_price, end_pricecoarse包含start_time, end_time。注意,所有的时间都是熊猫时间戳对象,例如,2016-12-12 01:03:13.15231+00:00

对于coarse,我需要添加2个额外的列start_price, end_price,其中coarse.start_pricefine.start_price,fine.start_time最接近coarse.start_time,类似地对于end_price也是如此。

下面是一个清晰的例子:

coarse:
start_time                              end_time
2016-12-12 01:00:00.000+00:00           2016-12-12 02:00:00.000+00:00
2016-12-12 02:00:00.000+00:00           2016-12-12 03:00:00.000+00:00
2016-12-12 03:00:00.000+00:00           2016-12-12 03:30:00.000+00:00
fine:
start_time                              end_time                         start_price 
2016-12-12 00:59:00.000+00:00           2016-12-12 01:12:00.000+00:00    2.3
2016-12-12 01:12:00.000+00:00           2016-12-12 01:15:00.000+00:00    4.5
2016-12-12 01:15:00.000+00:00           2016-12-12 01:45:00.000+00:00    5.7
2016-12-12 01:45:00.000+00:00           2016-12-12 01:55:00.000+00:00    8.8
2016-12-12 01:55:00.000+00:00           2016-12-12 02:15:00.000+00:00    9.9
2016-12-12 02:15:00.000+00:00           2016-12-12 02:16:00.000+00:00    11.2
2016-12-12 02:16:00.000+00:00           2016-12-12 02:31:00.000+00:00    13.5
2016-12-12 02:31:00.000+00:00           2016-12-12 02:45:00.000+00:00    14.8
2016-12-12 02:45:00.000+00:00           2016-12-12 02:59:00.000+00:00    15.9
2016-12-12 02:59:00.000+00:00           2016-12-12 03:31:00.000+00:00    16.0

所以结果应该是

coarse:

start_time                              end_time                         start_price
2016-12-12 01:00:00.000+00:00           2016-12-12 02:00:00.000+00:00    2.3
2016-12-12 02:00:00.000+00:00           2016-12-12 03:00:00.000+00:00    9.9
2016-12-12 03:00:00.000+00:00           2016-12-12 03:30:00.000+00:00    16.0

(我跳过了end_price,因为它是相同的逻辑)。我想知道最简单的方法是什么?

此外,与其使用最近的邻居,我可能会考虑根据时间插入价格。有没有简单的方法可以做到这一点呢?

您可能正在寻找pd。Merge_asof对于在不完全匹配的时间进行连接是完美的。有一个方向参数,你可以调整,使匹配到你想要的方向。

import pandas as pd
coarse = pd.DataFrame({'start_time': ['2016-12-12 01:00:00.000+00:00', '2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00'], 'end_time': ['2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00', '2016-12-12 03:30:00.000+00:00']} )
fine = pd.DataFrame({'start_time': ['2016-12-12 00:59:00.000+00:00', '2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00'], 'end_time': ['2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00', '2016-12-12 03:31:00.000+00:00'], 'start_price': [2.3, 4.5, 5.7, 8.8, 9.9, 11.2, 13.5, 14.8, 15.9, 16.0]})
coarse['start_time'] = pd.to_datetime(coarse['start_time'])
fine['start_time'] = pd.to_datetime(fine['start_time'])
coarse['end_time'] = pd.to_datetime(coarse['end_time'])
fine['end_time'] = pd.to_datetime(fine['end_time'])

coarse = pd.merge_asof(coarse, fine[['start_time','start_price']],
on='start_time',
direction='nearest')
coarse = pd.merge_asof(coarse,
fine[['end_time','start_price']].rename(columns={'start_price':'end_price'}),
on='end_time',
direction='nearest')

输出
start_time                  end_time  start_price  end_price
0 2016-12-12 01:00:00+00:00 2016-12-12 02:00:00+00:00          2.3        8.8
1 2016-12-12 02:00:00+00:00 2016-12-12 03:00:00+00:00          9.9       15.9
2 2016-12-12 03:00:00+00:00 2016-12-12 03:30:00+00:00         16.0       16.0

最新更新