在另一个阵列中高效查找下一个更大的数组



是否可以删除此函数中的 for 循环并加快进程? 我无法使用此函数的向量方法获得相同的结果。 还是有其他选择?

import numpy as np
indices = np.array(
    [814, 935, 1057, 3069, 3305, 3800, 4093, 4162, 4449])
within = np.array(
    [193, 207, 243, 251, 273, 286, 405, 427, 696,
     770, 883, 896, 1004, 2014, 2032, 2033, 2046, 2066,
     2079, 2154, 2155, 2156, 2157, 2158, 2159, 2163, 2165,
     2166, 2167, 2183, 2184, 2208, 2210, 2212, 2213, 2221,
     2222, 2223, 2225, 2226, 2227, 2281, 2282, 2338, 2401,
     2611, 2612, 2639, 2640, 2649, 2700, 2775, 2776, 2785,
     3030, 3171, 3191, 3406, 3427, 3527, 3984, 3996, 3997,
     4024, 4323, 4331, 4332])

def get_first_ind_after(indices, within):
    """returns array of the first index after each listed in indices
    indices and within must be sorted ascending
    """
    first_after_leading = []
    for index in indices:
        for w_ind in within:
            if w_ind > index:
                first_after_leading.append(w_ind)
                break
    # convert to np array
    first_after_leading = np.array(first_after_leading).flatten()
    return np.unique(first_after_leading)

如果有的话,它应该为索引数组中的每个数字返回下一个最大数字。

# Output:
[ 883 1004 2014 3171 3406 3984 4323]
这是一个

基于np.searchsorted -

def next_greater(indices, within):
    idx = np.searchsorted(within, indices)
    idxv = idx[idx<len(within)]
    idxv_unq = np.unique(idxv)
    return within[idxv_unq]

或者,idxv_unq可以这样计算,并且应该更有效 -

idxv_unq = idxv[np.r_[True,idxv[:-1] != idxv[1:]]]

试试这个:

[within[within>x][0] if len(within[within>x])>0 else 0 for x in indices]

如,

In [35]: import numpy as np
    ...: indices = np.array([814, 935, 1057, 3069, 3305, 3800, 4093, 4162, 4449])
    ...:
    ...: within = np.array(
    ...:     [193, 207, 243, 251, 273, 286, 405, 427, 696,
    ...:      770, 883, 896, 1004, 2014, 2032, 2033, 2046, 2066,
    ...:      2079, 2154, 2155, 2156, 2157, 2158, 2159, 2163, 2165,
    ...:      2166, 2167, 2183, 2184, 2208, 2210, 2212, 2213, 2221,
    ...:      2222, 2223, 2225, 2226, 2227, 2281, 2282, 2338, 2401,
    ...:      2611, 2612, 2639, 2640, 2649, 2700, 2775, 2776, 2785,
    ...:      3030, 3171, 3191, 3406, 3427, 3527, 3984, 3996, 3997,
    ...:      4024, 4323, 4331, 4332])
In [36]: [within[within>x][0] if len(within[within>x])>0 else 0 for x in indices]
Out[36]: [883, 1004, 2014, 3171, 3406, 3984, 4323, 4323, 0]

这是一种称为列表理解的pythonic方法,它是foreach循环的缩短版本。 因此,如果我要扩展它:

result = []
for x in indices:
    # This next line is a boolean index into the array, if returns all of the items in the array that have a value greater than x
    y = within[within>x]
    # At this point, y is an array of all the items which are larger than x.  Since you wanted the first of these items, we'll just take the first item off of this new array, but it is possible that y is None (there are no values that match the condition), so there is a check for that
    if len(y) > 0:
         z = y[0]
    else:
         z = 0 # or None or whatever you like
    # Now add this value to the array that we are building
    result.append(z)
# Now result has the array

我这样写,因为它使用向量运算(即布尔掩码(并且还利用列表理解,这是一种更简单的方法来编写返回数组的 foreach。

最新更新