基于布尔值更改 pd.df 列中值的矢量化解决方案



我的df看起来像这样:

              code       date type  strike  settlement
0   CBT_21_G2015_S 2015-01-02    C   126.2    1.343750
1   CBT_21_G2015_S 2015-01-02    P   131.7    4.359375
2   CBT_21_G2015_S 2015-01-02    C   102.5   24.671875
3   CBT_21_G2015_S 2015-01-02    P   110.5    0.015625
4   CBT_21_G2015_S 2015-01-02    P   101.2    0.015625
5   CBT_21_G2015_S 2015-01-02    C   140.5    0.015625

我希望通过执行以下操作将罢工更改为季度罢工:如果 DF['罢工'] % 0.25 != 0 添加 0.05。

期望输出:

              code       date type  strike  settlement
0   CBT_21_G2015_S 2015-01-02    C   126.25   1.343750
1   CBT_21_G2015_S 2015-01-02    P   131.75   4.359375
2   CBT_21_G2015_S 2015-01-02    C   102.5   24.671875
3   CBT_21_G2015_S 2015-01-02    P   110.5    0.015625
4   CBT_21_G2015_S 2015-01-02    P   101.25   0.015625
5   CBT_21_G2015_S 2015-01-02    C   140.5    0.015625

请问最简单/最快的方法是什么?

一个带有np.ceil的数学魔力 -

df['strike'] = np.ceil(df.strike * 4) / 4

df
             code        date type  strike  settlement
0  CBT_21_G2015_S  2015-01-02    C  126.25    1.343750
1  CBT_21_G2015_S  2015-01-02    P  131.75    4.359375
2  CBT_21_G2015_S  2015-01-02    C  102.50   24.671875
3  CBT_21_G2015_S  2015-01-02    P  110.50    0.015625
4  CBT_21_G2015_S  2015-01-02    P  101.25    0.015625
5  CBT_21_G2015_S  2015-01-02    C  140.50    0.015625

正如时间所显示的那样,它真的很快。

df = pd.concat([df] * 100000, ignore_index=True)
%timeit np.ceil(df.strike.values * 4) / 4
5.1 ms ± 60.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

你需要np.where

df.strike = np.where(df.strike % 0.25 == 0, df.strike, df.strike + 0.05)
df
             code        date type  strike  settlement
0  CBT_21_G2015_S  2015-01-02    C  126.25    1.343750
1  CBT_21_G2015_S  2015-01-02    P  131.75    4.359375
2  CBT_21_G2015_S  2015-01-02    C  102.50   24.671875
3  CBT_21_G2015_S  2015-01-02    P  110.50    0.015625
4  CBT_21_G2015_S  2015-01-02    P  101.25    0.015625
5  CBT_21_G2015_S  2015-01-02    C  140.50    0.015625

最新更新