在给定两个布尔数组的情况下，如何找到最接近的有效值的索引

基本上，我有一个问题，我有两个长度为L的数组：一个数据数组(我们称之为D(，表示我的实际数据，另一个有效性数组(这里称为V(，带有布尔值，表示这些值中哪些是有效的。

例如，想象一下我有：

D = [10, 20, 40, 1000, 2000, -1000, 50, 20, 1000]
V = [1, 1, 1, 0, 0, 0, 1, 1, 0]

在这种情况下，我的V数组指示索引3、4、5和8上的值无效。

对于这些索引中的每一个，我都希望用最接近的有效索引中的数据替换相应的数据值D[i]。

因此，我的索引查找函数(只接受有效性数组(将给出：
f(V) == [0, 1, 2, 2, 2, 6, 6, 7, 7](或
f(V) == [0, 1, 2, 2, 6, 6, 6, 7, 7]，并不重要(

在这种情况下，我可以用纠正我的D阵列

D[i] = D[f(V)]

获得：

D == [10, 20, 40, 40, 40, 50, 50, 20, 20]

这样的东西在Python中实现了吗？如果没有，我该如何轻松实现？

使用Pandas系列，您可以在一行中完成(初始化后(：

d = pd.Series(D)
v = pd.Series(V).astype(bool)
out = d[v].reindex(d.index, method='nearest')

或者，作为列表：

D2 = d[v].reindex(d.index, method='nearest').tolist()

你的数据结果是：

>>> D2
[10, 20, 40, 40, 50, 50, 50, 20, 20]

编辑：仅限numpy：

仅使用numpy执行此操作稍微复杂一些。首先，我们使用重置为零的累积和来找到德尔塔指数，包括正向和反向。必须注意v以0开始和/或结束的情况。

def cumsum_reset0(v):
v = v.copy()
zero = v == 0
c = np.cumsum(~zero)
v[zero] = -np.diff(np.r_[0, c[zero]])
return np.cumsum(v)
def closest_index(v):
assert np.any(v), "no valid index found"
n = len(v)
a = cumsum_reset0(1 - v)
b = cumsum_reset0((1 - v)[::-1])[::-1]
i = np.arange(n)
a[i - a < 0] = n
b[i + b >= n] = n
return np.where(a < b, i-a, i+b)

示例：

>>> closest_index(np.array(V))
array([0, 1, 2, 2, 6, 6, 6, 7, 7])
>>> closest_index(np.array([0, 0, 1, 1, 0, 0, 1, 1, 0, 0])
array([2, 2, 2, 3, 3, 6, 6, 7, 7, 7])
>>> closest_index(np.array([0, 0, 0, 1, 0]))
array([3, 3, 3, 3, 3])

但是：

>>> closest_index(np.array([0, 0, 0, 0, 0]))
AssertionError: no valid index found

您可以使用pandas和interpolate:

df = pd.DataFrame({'D': D, 'V': V})
D2 = (df['D']
.mask(df['V'].eq(0))
.interpolate(method='nearest')
.ffill(downcast='infer')
.tolist()
)

输出：[10, 20, 40, 40, 40, 50, 50, 20, 20]

以下是如何只使用内置的Python函数(即不需要numpy、pandas或任何其他函数(来实现这一点。在给定无效数据元素的索引的情况下，f()函数从字面上搜索D中最接近的有效索引。

D = [10, 20, 40, 1000, 2000, -1000, 50, 20, 1000]
V = [1, 1, 1, 0, 0, 0, 1, 1, 0]
def f(V):
def index_of_closest(i, V):
"""Find valid index closest_index to index i based on V validity arrary."""
closest_index = V.index(1)  # Init to index of first valid element.
min_diff = abs(closest_index-i)  # Init to difference between indices.
for index in range(len(V)):  # Search for an even closer index.
if V[index]:  # Valid?
index_diff = abs(index-i)
if index_diff < min_diff:
closest_index = index
min_diff = index_diff
return closest_index
return [i if V[i] else index_of_closest(i, V) for i, d in enumerate(D)]

if __name__ == '__main__':
r = f(V)
print(f'{D=}')  # -> D=[10, 20, 40, 1000, 2000, -1000, 50, 20, 1000]
print(f'{r=}')  # -> r=[0, 1, 2, 2, 2, 6, 6, 7, 7]
print()
print('Updated D')
D = [D[i] for i in f(V)]  # -> D=[10, 20, 40, 40, 40, 50, 50, 20, 20]
print(f'{D=}')

相关内容

最新更新

热门标签：