Pandas:在给定值的DataFrame中搜索最近的前一行



我正在尝试创建一个函数,当找到一个条件时(特别是当列值=="Trade"时(,它会搜索两列的前几行,并找到与交易price匹配的值,计算在每一列中找到价格的次数。

我有一个报价/订单记录的Pandas数据帧:

import io
import pandas as pd

data_raw = """
date_time, type, price, bid_price, ask_price,
2020-11-16 01:39:06.221102501, Quote,  7045.5, 7045.0, 7047.0 
2020-11-16 01:39:06.221102501, Trade,  7045.5, 7045.0, 7047.0 
2020-11-16 01:39:06.221102501, Trade,  7045.5, 7045.0, 7047.0 
2020-11-16 01:39:09.044653308, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.044653308, Quote,  7045.0, 7045.0, 7046.5
2020-11-16 01:39:09.044653308, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.044653308, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.824195125, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.824195125, Quote,  7045.0, 7045.0, 7046.5
2020-11-16 01:39:09.824195125, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.984520580, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:10.408151534, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:10.564184018, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:11.008204633, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:11.032939855, Quote,  7045.0, 7045.0, 7046.0
2020-11-16 01:39:11.230065390, Quote, 7045.0, 7044.0,  7046.0
2020-11-16 01:39:11.230065390, Trade,  7045.0, 7044.0, 7046.0
2020-11-16 01:39:11.232176362, Quote,  7045.0, 7044.0, 7045.5
2020-11-16 01:39:12.924164557, Quote,  7045.0, 7044.0, 7046.0
"""
df = pd.read_csv(io.StringIO(data_raw), sep=",")
df

我想创建两个单独的列,它们是一笔交易与bid_price列或ask_price列上最近的报价匹配的累计次数之和。我怀疑我需要创建一个函数来实现这一点。

例如,当记录type为"Trade",价格为7040.0时,我希望函数搜索前n条记录并找到匹配的报价。如果价格在"bid_price"上匹配,那么我可以+=1出价计数,反之亦然,如果它在"ask_price"上匹配。

我尝试过将"date_time"列设置为索引并按时间顺序排序,然而,交易并不总是紧跟在报价之后。

我最初的想法是使用类似pd.index.get_loc(df['price'], method='ffill', tolerance=10)的东西,但是,我不知道如何搜索单独的bid_priceask_price列?

预期输出:

date_time                     type  price  bid_price ask_price cum_bid cum_ask
2020-11-16 01:39:06.221102501 Quote 7045.5 7045.0    7047.0    0       0
2020-11-16 01:39:06.221102501 Trade 7045.5 7045.0    7047.0    1       0
2020-11-16 01:39:06.221102501 Trade 7045.5 7045.0    7047.0    2       0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0    7046.5    2       0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0    7046.5    2       0
2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:09.984520580 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:10.408151534 Quote 7045.0 7045.0    7046.0    2       0     
2020-11-16 01:39:10.564184018 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:11.008204633 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:11.032939855 Quote 7045.0 7045.0    7046.0    2       0
2020-11-16 01:39:11.230065390 Quote 7045.0 7044.0    7046.0    2       0
2020-11-16 01:39:11.230065390 Trade 7045.0 7044.0    7046.0    3       0
2020-11-16 01:39:11.232176362 Quote 7045.0 7044.0    7045.5    3       0
2020-11-16 01:39:12.924164557 Quote 7045.0 7044.0    7046.0    3       0

感谢您的帮助。

回填价格列

df['price'] = df['price'].bfill()

检查它在哪里等于买卖栏,计算累计金额并用交易栏进行过滤

df['cum_ask'] = df['price'].eq(df['ask_price']).cumsum().where(df['type'].eq(' Trade'), None).ffill()
df['cum_bid'] = df['price'].eq(df['bid_price']).cumsum().where(df['type'].eq(' Trade'), None).ffill()

最终清理创建了nans

df[['cum_bid', 'cum_ask']] = df[['cum_bid', 'cum_ask']].fillna(0).astype(int)

date_time    type   price  bid_price  ask_price  cum_ask  cum_bid
0   2020-11-16 01:39:06.221102501   Quote  7045.5     7045.0     7047.0        0        0
1   2020-11-16 01:39:06.221102501   Trade  7045.5     7045.0     7047.0        0        0
2   2020-11-16 01:39:06.221102501   Trade  7045.5     7045.0     7047.0        0        0
3   2020-11-16 01:39:09.044653308   Quote  7045.0     7045.0     7046.0        0        0
4   2020-11-16 01:39:09.044653308   Quote  7045.0     7045.0     7046.5        0        0
5   2020-11-16 01:39:09.044653308   Quote  7045.0     7045.0     7046.0        0        0
6   2020-11-16 01:39:09.044653308   Quote  7045.0     7045.0     7046.0        0        0
7   2020-11-16 01:39:09.824195125   Quote  7045.0     7045.0     7046.0        0        0
8   2020-11-16 01:39:09.824195125   Quote  7045.0     7045.0     7046.5        0        0
9   2020-11-16 01:39:09.824195125   Quote  7045.0     7045.0     7046.0        0        0
10  2020-11-16 01:39:09.984520580   Quote  7045.0     7045.0     7046.0        0        0
11  2020-11-16 01:39:10.408151534   Quote  7045.0     7045.0     7046.0        0        0
12  2020-11-16 01:39:10.564184018   Quote  7045.0     7045.0     7046.0        0        0
13  2020-11-16 01:39:11.008204633   Quote  7045.0     7045.0     7046.0        0        0
14  2020-11-16 01:39:11.032939855   Quote  7045.0     7045.0     7046.0        0        0
15  2020-11-16 01:39:11.230065390   Quote  7045.0     7044.0     7046.0        0        0
16  2020-11-16 01:39:11.230065390   Trade  7045.0     7044.0     7046.0        0       12
17  2020-11-16 01:39:11.232176362   Quote  7045.0     7044.0     7045.5        0       12
18  2020-11-16 01:39:12.924164557   Quote  7045.0     7044.0     7046.0        0       12

最新更新