加快搜索大熊猫数据帧中最近的上下限值的速度

我的数据帧看起来与下面的示例相似(只是有更多的条目(。我想为每个组获得给定值的最接近的上下数字。

例如，对于值13。我想获得一个类似于的新数据帧

我已经尝试了Ivo Merchiers在《如何在Pandas系列中找到与输入数字最接近的值？》中的解决方案？使用groupby和application为不同的组运行它。

def find_neighbours(value):
exactmatch=df[df.num==value]
if !exactmatch.empty:
return exactmatch.index
else:
lowerneighbour_ind = df[df.num<value].num.idxmax()
upperneighbour_ind = df[df.num>value].num.idxmin()
return [lowerneighbour_ind, upperneighbour_ind]
df=df.groupby('a').apply(find_neighbours, 13)

但由于我的数据集有大约1600万行，这个过程需要非常长的时间。有没有更快的方法来获得解决方案？

编辑感谢您的回答。我忘了添加一些信息。如果一个关闭的数字出现多次，我希望所有的行都转移到新的数据帧。当只有一个上(下(邻居而没有下(上(邻居时，应该忽略这些行。

引导13到这个：

谢谢你的帮助！

是的，我们可以加快

v=13
s=(df.b-v)
t=s.abs().groupby([df.a,np.sign(s)]).transform('min')
df1=df.loc[s.abs()==t]
df1=df1[df1.b.sub(v).groupby(df.a).transform('nunique')>1]
df1
Out[102]: 
a   b
1   600  12
2   600  15
5   700  11
6   700  19
9   900  12
10  900  14
11  900  14

尝试这个

def neighbours(x):
d = (df.b-x)
return df.loc[[d[d==d[d>0].min()].index[0], d[d==d[d<0].max()].index[0]]]
neighbours(13)

相关内容

最新更新

热门标签：