如何在pandas系列中获得最接近零值的n



如何获得与0最接近的n值,类似于如何使用nsmallest()获得最小的n值。例如,使用

series = pd.Series([-1.0,-0.75,-0.5,-0.25,0.25,0.5,0.75,1.0])
series
0   -1.00
1   -0.75
2   -0.50
3   -0.25
4    0.25
5    0.50
6    0.75
7    1.00
dtype: float64

例如n=4,我想得到以下内容。

0   -0.25
1   0.25
2   -0.50
3   0.50
dtype: float64

使用locabsnsmallest:

series.loc[series.abs().nsmallest(4).index]
3   -0.25
4    0.25
2   -0.50
5    0.50
dtype: float64

使用Series.absSeries.argsort作为位置,过滤n,如果性能很重要,则按Series.iloc选择:

n = 4
series = series.iloc[series.abs().argsort()[:n]]
print (series)
3   -0.25
4    0.25
2   -0.50
5    0.50
dtype: float64

如果需要,最后一个默认索引:

n = 4
series = series.iloc[series.abs().argsort()[:n]].reset_index(drop=True)
print (series)
0   -0.25
1    0.25
2   -0.50
3    0.50
dtype: float64

性能

series = pd.Series([-1.0,-0.75,-0.5,-0.25,0.25,0.5,0.75,1.0] * 10000)
n = 4000
series = series.iloc[series.abs().argsort()[:n]]
print (series)
In [114]: %timeit series.iloc[series.abs().argsort()[:n]]
794 µs ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [115]: %timeit series.loc[series.abs().nsmallest(n).index]
2.09 ms ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

最新更新