例如,我们使用以下Series对象:
mySeries = pd.Series( range(0,20,2), index=range(1,11), name='col')
访问值元素的正确方式是什么?
我会说mySeries.iat[5]
或mySeries.at[5]
,这取决于我们使用位置或索引。
但我发现mySeries.tolist()[5]
比mySeries.iat[5]
快3或4倍,后者比mySeries.at[5]
快。("loc"one_answers"iloc"甚至更差。)
这让我很惊讶;iat";以及";在"?
因为测试来自小型系列的短列表,所以转换到列表和索引非常快:
mySeries = pd.Series( range(0,20,2), index=range(1,11), name='col')
%timeit mySeries.iat[5]
3.61 µs ± 261 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.at[5]
5.11 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.tolist()
1.58 µs ± 78.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit mySeries.tolist()[5]
1.63 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
如果1M值,则速度较慢,因为瓶颈正在转换为列表:
mySeries = pd.Series( range(0,2000000,2), name='col')
%timeit mySeries.iat[5]
3.46 µs ± 72.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.at[5]
4.74 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit mySeries.tolist()
40.2 ms ± 618 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit mySeries.tolist()[5]
40.3 ms ± 517 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)