Python Dataframe查找与公差最接近的匹配值



我有一个由列表作为元素组成的数据框架。我想在给定值的百分比内找到最接近的匹配值。我的代码:

df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
df
A
0     [1, 2]
1  [3, 5, 7]
# in each row, lets find a the values and their index that match 5 with 20% tolerance 
val = 5
tol = 0.2 # find values matching 5 or 20% within 5 (4 or 6)
df['Matching_index'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)

目前的解决方案:

df
A     Matching_index
0     [1, 2]     1                # 2 matches closely with 5 but this is wrong
1  [4, 5, 6]     1                # 5 matches with 5, correct.

预期的解决方案:

df
A     Matching_index
0     [1, 2]     NaN              # No matching value, hence NaN
1  [4, 5, 6]     1                # 5 matches with 5, correct.

思路是先得到与val的差值,如果不匹配则替换为缺失值,最后得到np.nanargmin,如果缺失所有值则产生错误,因此增加了np.any的下一个条件:

def f(x):
a = np.abs(np.array(x)-val)
m = a <= val * tol
return np.nanargmin(np.where(m, a, np.nan)) if m.any() else np.nan

df['Matching_index']  = df['A'].map(f)
print (df)
A  Matching_index
0     [1, 2]             NaN
1  [4, 5, 6]             1.0

熊猫的解决方案:

df1 = pd.DataFrame(df['A'].tolist(), index=df.index).sub(val).abs()
df['Matching_index'] = df1.where(df1 <= val * tol).dropna(how='all').idxmin(axis=1)

我不确定你是想要所有的索引还是只是一个计数器。

试试这个:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[[1,2],[4,5,6,7,8]]})
val = 5
tol = 0.3
def closest(arr,val,tol):
idxs = [ idx for idx,el in enumerate(arr) if (np.abs(el - val) < val*tol)]
result = len(idxs) if len(idxs) != 0 else np.nan
return result
df['Matching_index'] = df['A'].apply(closest, args=(val,tol,))
df

如果您想要所有索引,只需返回idxs而不是len(idxs)

最新更新