Python,提高了多值压力测试的速度



我想提高以下代码的速度。数据集是我想通过模拟各种参数进行压力测试的交易列表,并将所有结果存储在一个表中。

我执行这项工作的方式是,设计参数的范围,然后迭代它们的值,启动数据集的副本,将参数的值分配给新列,并将所有内容连接到一个巨大的数据帧中。

我想知道是否有人有一个好主意来避免三个for循环来构建数据帧?

''

# Defining the range of parameters to simulate
volchange = range(-1,2)
spreadchange = range(-10,11)
flatchange = range(-10,11)
# the df where I store all the results
final_result = pd.DataFrame()    

# Iterating over the range of parameters 
for vol in volchange:
for spread in spreadchange:
for flat in flatchange:

# Creating a copy of the initial dataset, assigning the simulated values to three 
# new columns and concat it with the rest, resulting in a dataframe which is 
# several time the initial dataset with all the possible triplet of parameters
inter_pos = pos.copy()

inter_pos['vol_change[pts]'] = vol
inter_pos['spread_change[%]'] = spread                  
inter_pos['spot_change[%]'] = flat

final_result = pd.concat([final_result,inter_pos], axis = 0)

# Performing computation at dataframe level            
final_result['sim_vol'] = final_result['vol_change[pts]'] + final_result['ImpliedVolatility']
final_result['spread'].multiply(final_result['spread_change[%]'])/100 
final_result['sim_spread'] = final_result['spread'] + final_result['spread_change']
final_result['spot_change'] = final_result['spot'] * final_result['spot_change[%]']/100
final_result['sim_spot'] = final_result['spot'] + final_result['spot_change']
final_result['sim_price'] = final_result['sim_spot'] - final_result['sim_spread']

''

非常感谢你的帮助!

祝你度过美好的一周!

将panda数据帧连接到另一个数据帧需要很长时间。最好创建一个数据帧列表,然后使用pd.concat将它们一次连接起来
您可以这样自己测试:

import pandas as pd
import numpy as np
from time import time
dfs = []
columns = [f"{i:02d}" for i in range(100)]
time_start = time()
for i in range(100):
data = np.random.random((10000, 100))
df = pd.DataFrame(columns=columns, data=data)
dfs.append(df)
new_df = pd.concat(dfs)
time_end = time()
print(f"Time elapsed: {time_end-time_start}")
# Time elapsed: 1.851675271987915
new_df = pd.DataFrame(columns=columns)
time_start = time()
for i in range(100):
data = np.random.random((10000, 100))
df = pd.DataFrame(columns=columns, data=data)
new_df = pd.concat([new_df, df])
time_end = time()
print(f"Time elapsed: {time_end-time_start}")
# Time elapsed: 12.258363008499146

您还可以使用itertools.product来消除嵌套的for循环。

同样由@Ahmed AEK:建议

您可以直接将data=itertools.product(volchange, spreadchange ,flatchange )传递给pd.DataFrame,并避免完全创建列表,这是一种更高效、更快的方法

最新更新