Python 生成器,用于在上三角矩阵上均匀拆分(用于并行化)迭代



我使用以下代码并行迭代矩阵的上三角形部分,但我更愿意在不实例化整个索引对集的情况下执行此操作。

目标是处理矩阵上三角形部分中的所有项,但要并行处理该处理。另请注意,如果他们有一些工具可以帮助解决这个问题,我可以使用 3rd 方库(numpy 等)。

n_processes = 4
n = 1000  # num cols/rows in matrix
pairs = [(i, j) for i, j in itertools.combinations(xrange(n), 2)]
per_chunk = int(round(len(pairs) / float(n_processes)))
pair_chunks = [pairs[i*per_chunk:i*per_chunk+per_chunk] for i in xrange(n_processes)]
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[0])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[1])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[2])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[3])
def process_pairs(cur_pairs):
    for i, j in pairs:
        # do some stuff

关于将其表示为生成器(即,不生成所有索引对)的任何聪明想法? 照原样,对需要加载到内存中,如果 n 非常大,那就是我想避免的内存命中。

也许是这样的(转换为Python 3):

from itertools import combinations_with_replacement, islice, tee
n_processes = 3
n = 10  # num cols/rows in matrix
pairs = ((i, j) for i, j in combinations_with_replacement(range(n), 2) if i != j)
pair_chunks = [
  islice(p, i, None, n_processes)
  for i, p in enumerate(tee(pairs, n_processes))
]
print(pair_chunks)
print([list(x) for x in pair_chunks])

输出:

[<itertools.islice object at 0x7f2149fbe138>, <itertools.islice object at 0x7f2149fbecc8>, <itertools.islice object at 0x7f2149fbe228>]
[[(0, 1), (0, 4), (0, 7), (1, 2), (1, 5), (1, 8), (2, 4), (2, 7), (3, 4), (3, 7), (4, 5), (4, 8), (5, 7), (6, 7), (7, 8)], [(0, 2), (0, 5), (0, 8), (1, 3), (1, 6), (1, 9), (2, 5), (2, 8), (3, 5), (3, 8), (4, 6), (4, 9), (5, 8), (6, 8), (7, 9)], [(0, 3), (0, 6), (0, 9), (1, 4), (1, 7), (2, 3), (2, 6), (2, 9), (3, 6), (3, 9), (4, 7), (5, 6), (5, 9), (6, 9), (8, 9)]]

这会使用 tee 复制生成器,然后使用从不同位置开始的islice创建一个新生成器,每个生成器向前移动n_processes步。

或使用流程的完整示例:

from multiprocessing import Process
from itertools import combinations_with_replacement, islice, tee
n_processes = 3
n = 10  # num cols/rows in matrix
pairs = ((i, j) for i, j in combinations_with_replacement(range(n), 2) if i != j)
pair_chunks = [
    islice(p, i, None, n_processes)
    for i, p in enumerate(tee(pairs, n_processes))
]
def process_pairs(i, pair_chunk):
    print('process %d received type %s' % (i, type(pair_chunk)))
    for x in pair_chunk:
        print('process %d processing %s' % (i, x))
processes = [
    Process(target=process_pairs, args=[i, pair_chunk])
    for i, pair_chunk in enumerate(pair_chunks)
]
for p in processes:
    p.start()
for p in processes:
    p.join()

输出:

process 0 received type <class 'itertools.islice'>
process 1 received type <class 'itertools.islice'>
process 1 processing (0, 2)
process 1 processing (0, 5)
process 1 processing (0, 8)
process 1 processing (1, 3)
process 1 processing (1, 6)
process 1 processing (1, 9)
process 1 processing (2, 5)
process 1 processing (2, 8)
process 1 processing (3, 5)
process 0 processing (0, 1)
...

大概,如果你不想要索引,生成器将需要直接返回值。

同样,听起来您不想展平上三角形,因此行向量需要保持不同。

有了这些假定的要求,这里有一个生成器,它产生行向量的连续切片:

>>> def generate_upper_triangular(m):
        for i, row in enumerate(m):
            yield row[i:]
>>> m = [[1, 2, 3],
         [0, 5, 6],
         [0, 0, 9]]
>>> for vec in generate_upper_triangular(m):
        print(vec)
[1, 2, 3]
[5, 6]
[9]