我正在尝试创建一个函数,当找到一个条件时(特别是当列值=="Trade"时(,它会搜索两列的前几行,并找到与交易price
匹配的值,计算在每一列中找到价格的次数。
我有一个报价/订单记录的Pandas数据帧:
import io
import pandas as pd
data_raw = """
date_time, type, price, bid_price, ask_price,
2020-11-16 01:39:06.221102501, Quote, 7045.5, 7045.0, 7047.0
2020-11-16 01:39:06.221102501, Trade, 7045.5, 7045.0, 7047.0
2020-11-16 01:39:06.221102501, Trade, 7045.5, 7045.0, 7047.0
2020-11-16 01:39:09.044653308, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.044653308, Quote, 7045.0, 7045.0, 7046.5
2020-11-16 01:39:09.044653308, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.044653308, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.824195125, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.824195125, Quote, 7045.0, 7045.0, 7046.5
2020-11-16 01:39:09.824195125, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:09.984520580, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:10.408151534, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:10.564184018, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:11.008204633, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:11.032939855, Quote, 7045.0, 7045.0, 7046.0
2020-11-16 01:39:11.230065390, Quote, 7045.0, 7044.0, 7046.0
2020-11-16 01:39:11.230065390, Trade, 7045.0, 7044.0, 7046.0
2020-11-16 01:39:11.232176362, Quote, 7045.0, 7044.0, 7045.5
2020-11-16 01:39:12.924164557, Quote, 7045.0, 7044.0, 7046.0
"""
df = pd.read_csv(io.StringIO(data_raw), sep=",")
df
我想创建两个单独的列,它们是一笔交易与bid_price
列或ask_price
列上最近的前报价匹配的累计次数之和。我怀疑我需要创建一个函数来实现这一点。
例如,当记录type
为"Trade",价格为7040.0时,我希望函数搜索前n条记录并找到匹配的报价。如果价格在"bid_price"上匹配,那么我可以+=1出价计数,反之亦然,如果它在"ask_price"上匹配。
我尝试过将"date_time"列设置为索引并按时间顺序排序,然而,交易并不总是紧跟在报价之后。
我最初的想法是使用类似pd.index.get_loc(df['price'], method='ffill', tolerance=10)
的东西,但是,我不知道如何搜索单独的bid_price
或ask_price
列?
预期输出:
date_time type price bid_price ask_price cum_bid cum_ask
2020-11-16 01:39:06.221102501 Quote 7045.5 7045.0 7047.0 0 0
2020-11-16 01:39:06.221102501 Trade 7045.5 7045.0 7047.0 1 0
2020-11-16 01:39:06.221102501 Trade 7045.5 7045.0 7047.0 2 0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.5 2 0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0 7046.5 2 0
2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:09.984520580 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:10.408151534 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:10.564184018 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:11.008204633 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:11.032939855 Quote 7045.0 7045.0 7046.0 2 0
2020-11-16 01:39:11.230065390 Quote 7045.0 7044.0 7046.0 2 0
2020-11-16 01:39:11.230065390 Trade 7045.0 7044.0 7046.0 3 0
2020-11-16 01:39:11.232176362 Quote 7045.0 7044.0 7045.5 3 0
2020-11-16 01:39:12.924164557 Quote 7045.0 7044.0 7046.0 3 0
感谢您的帮助。
回填价格列
df['price'] = df['price'].bfill()
检查它在哪里等于买卖栏,计算累计金额并用交易栏进行过滤
df['cum_ask'] = df['price'].eq(df['ask_price']).cumsum().where(df['type'].eq(' Trade'), None).ffill()
df['cum_bid'] = df['price'].eq(df['bid_price']).cumsum().where(df['type'].eq(' Trade'), None).ffill()
最终清理创建了nans
df[['cum_bid', 'cum_ask']] = df[['cum_bid', 'cum_ask']].fillna(0).astype(int)
date_time type price bid_price ask_price cum_ask cum_bid
0 2020-11-16 01:39:06.221102501 Quote 7045.5 7045.0 7047.0 0 0
1 2020-11-16 01:39:06.221102501 Trade 7045.5 7045.0 7047.0 0 0
2 2020-11-16 01:39:06.221102501 Trade 7045.5 7045.0 7047.0 0 0
3 2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.0 0 0
4 2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.5 0 0
5 2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.0 0 0
6 2020-11-16 01:39:09.044653308 Quote 7045.0 7045.0 7046.0 0 0
7 2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0 7046.0 0 0
8 2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0 7046.5 0 0
9 2020-11-16 01:39:09.824195125 Quote 7045.0 7045.0 7046.0 0 0
10 2020-11-16 01:39:09.984520580 Quote 7045.0 7045.0 7046.0 0 0
11 2020-11-16 01:39:10.408151534 Quote 7045.0 7045.0 7046.0 0 0
12 2020-11-16 01:39:10.564184018 Quote 7045.0 7045.0 7046.0 0 0
13 2020-11-16 01:39:11.008204633 Quote 7045.0 7045.0 7046.0 0 0
14 2020-11-16 01:39:11.032939855 Quote 7045.0 7045.0 7046.0 0 0
15 2020-11-16 01:39:11.230065390 Quote 7045.0 7044.0 7046.0 0 0
16 2020-11-16 01:39:11.230065390 Trade 7045.0 7044.0 7046.0 0 12
17 2020-11-16 01:39:11.232176362 Quote 7045.0 7044.0 7045.5 0 12
18 2020-11-16 01:39:12.924164557 Quote 7045.0 7044.0 7046.0 0 12