有没有办法理解零聚类,然后将其从 numpy 数组中删除



我想从numpy数组的中间删除零(但并非所有零都应该被删除(

在堆栈溢出的多个示例中演示了删除零,但我仍然发现很难为我的问题编写逻辑。

import numpy as np
a = np.array([255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,207,0,0
,159,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,64,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,88,239,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,231,88])

我有一个数组 a,其中有一些非零值和一大簇零(以及非零值中间的一些零(。我想删除那个大的零簇或找到那个大簇开始的索引。然后将数组简化为以下形式:

a1 = [255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,207,0,0
,159,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,64]

a2=[88,239,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,231,88]

请注意,数组 a1 中仍然有一些零。 该操作应仅在其中有一定数量的连续零(例如 10 个零(时才应删除零。我可以通过遍历数组的循环来做到这一点,但如果可以建议一种更简单、更快的方法,那就太好了。

没有任何导入,使用单个循环:

def remove_clusters(my_array, cluster_value, consecutive_max=10):
    my_result = [[]]
    cluster_list = []
    for e in my_array:
        if e == cluster_value:
            cluster_list.append(e)
        else:
            if len(cluster_list) <= consecutive_max:
                my_result[-1].extend(cluster_list)
            else:
                my_result.append([])
            cluster_list = []
            my_result[-1].append(e)
    return my_result

我使用 itertools.groupby 得到了这个,它简化了一点代码:

def remove_clusters(my_array, cluster_value=0, max_consecutive=10):
    from itertools import groupby
    my_result = [[]]
    for k,g in groupby(my_array):
        g = list(g)
        if k != cluster_value or len(g) <= max_consecutive:
            my_result[-1].extend(g)
        else:
            my_result.append([])
    return my_result

然后你可以做:

a1, a2 = remove_clusters(a)

最后,一个非常丑陋的单行使用functools.reduce

from itertools import groupby
from functools import reduce
a1, a2 = reduce(lambda x,y: x + [[]] if not y[0] and len(y)>10 
                       else x[:-1] + [x[-1]+y], 
                map(lambda x: list(x[1]), groupby(a)), 
                [[]])

我很想解释一下这个单行,但我已经不再明白了。

这是另一种方法,主要使用 numpy 而不是真正的任何数组(列表推导之外(。基本思想是获取[(value1, count1), (value2, count2)...]列表,然后在该列表中搜索所需的条件。

有几件事可以改进,主要是两次检查条件。

import numpy as np
a = np.array([255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,207,0,0
,159,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,64,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,88,239,255,255,255,255,255,255,255
,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,231,88])
def remove_consecutive(thearray, num, count):
    # Find the indices where there is a *change* in value
    adiff = np.diff(a)
    indices = np.hstack((np.array([0]), np.where(adiff)[0]+1))
    # Given the indices we can find the lengths of the lists by just doing a                                                                     
    # diff on the indices. Have to append a dummy so the last value is included.                                                                 
    lengths = np.diff(indices, append=indices[-1]+1)
    # Now construct a tuple of (value, length)
    the_list = list(zip(a[indices], lengths))
    # Find the places where we need to split based on the num/count requirement.                                                                 
    index_breaks = np.array([ii for ii, (lvalue, lcount) in enumerate(the_list) if lvalue == num and lcount > count])                            
    # Split the (value,length) list based on the index_breaks
    the_list_split = np.split(the_list, index_breaks)
    # Now expand back out.
    output_list = [ np.array([lvalue for lvalue, lcount in sublist for _ in range(lcount) if not( lvalue == num and lcount > count)])            
                    for sublist in the_list_split]
    return np.array(output_list).flatten()
a1, a2 = remove_consecutive(a, 0, 10)
print(a1)
print(a2)

输出为:

[255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
   0 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 207   0   0 159 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255  64]
[ 88 239 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255 231  88]

最新更新