我有以下熊猫df,我想创建另一列名为df['trade_percentage'],其中每行都有by,如果df['trade'] == "ask",df['trade_percentage'] = df['trade_size']/df['ask_size'] 否则如果df['trade'] == "bid", df['trade_percentage'] = df['trade_size']/df['bid_size']。忽略 NaN。我写了这些代码:
但是,我的数据帧中有 100 万行,它已经运行了几个小时,并且仍在运行。请问是否有任何方法可以简化它而不是使用循环?谢谢!
for i in range(len(df)):
if (df['trade'][i] == "ask"):
df['trade_percentage'][i] = df['trade_size'][i]/df['ask_size'][i]
elif (df['trade'][i] == "bid"):
df['trade_percentage'][i] = df['trade_size'][i]/df['bid_size'][i]
df.head()
time bid_size bid ask ask_size trade trade_size phase
0 2019-01-07 07:45:01.064515 495 152.52 152.54 19 NaN NaN OPEN
1 2019-01-07 07:45:01.110072 31 152.53 152.54 19 NaN NaN OPEN
2 2019-01-07 07:45:01.116596 32 152.53 152.54 19 NaN NaN OPEN
3 2019-01-07 07:45:01.116860 32 152.53 152.54 21 NaN NaN OPEN
4 2019-01-07 07:45:01.116905 34 152.53 152.54 21 NaN NaN OPEN
5 2019-01-07 07:45:01.116982 34 152.53 152.54 31 NaN NaN OPEN
6 2019-01-07 07:45:01.147901 38 152.53 152.54 31 NaN NaN OPEN
7 2019-01-07 07:45:01.189971 38 152.53 152.54 31 ask 15.0 OPEN
8 2019-01-07 07:45:01.189971 38 152.53 152.54 16 NaN NaN OPEN
9 2019-01-07 07:45:01.190766 37 152.53 152.54 16 NaN NaN OPEN
10 2019-01-07 07:45:01.190856 37 152.53 152.54 15 NaN NaN OPEN
11 2019-01-07 07:45:01.190856 37 152.53 152.54 16 ask 1.0 OPEN
12 2019-01-07 07:45:01.193938 37 152.53 152.55 108 NaN NaN OPEN
13 2019-01-07 07:45:01.193938 37 152.53 152.54 15 ask 15.0 OPEN
14 2019-01-07 07:45:01.194326 2 152.54 152.55 108 NaN NaN OPEN
15 2019-01-07 07:45:01.194453 2 152.54 152.55 97 NaN NaN OPEN
16 2019-01-07 07:45:01.194479 6 152.54 152.55 97 NaN NaN OPEN
17 2019-01-07 07:45:01.194507 19 152.54 152.55 97 NaN NaN OPEN
18 2019-01-07 07:45:01.194532 19 152.54 152.55 77 NaN NaN OPEN
19 2019-01-07 07:45:01.194598 19 152.54 152.55 79 NaN NaN OPEN
我正在使用np.select
s1=df['trade'] == "ask"
s2=df['trade'] == "bid"
np.select([s1,s2],[df['trade_size']/df['ask_size'],df['trade_size']/df['bid_size']],default =np.nan)