为什么是熊猫.Series.tolist()比panda快.系列.iat[]



例如,我们使用以下Series对象:

mySeries = pd.Series( range(0,20,2), index=range(1,11), name='col')

访问值元素的正确方式是什么?

我会说mySeries.iat[5]mySeries.at[5],这取决于我们使用位置或索引。

但我发现mySeries.tolist()[5]mySeries.iat[5]快3或4倍,后者比mySeries.at[5]快。("loc"one_answers"iloc"甚至更差。)

这让我很惊讶;iat";以及";在"?

因为测试来自小型系列的短列表,所以转换到列表和索引非常快:

mySeries = pd.Series( range(0,20,2), index=range(1,11), name='col')

%timeit mySeries.iat[5]
3.61 µs ± 261 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.at[5]
5.11 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.tolist()
1.58 µs ± 78.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit mySeries.tolist()[5]
1.63 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

如果1M值,则速度较慢,因为瓶颈正在转换为列表:

mySeries = pd.Series( range(0,2000000,2),  name='col')

%timeit mySeries.iat[5]
3.46 µs ± 72.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.at[5]
4.74 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.tolist()
40.2 ms ± 618 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit mySeries.tolist()[5]
40.3 ms ± 517 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

最新更新