Python多处理与共享输出

我正在尝试处理大量文本文件并计算其中的数据(简单添加(。问题在于这需要一个 long 时间，并且知道其他语言中有一些多处理功能，但从未在Python中做过这样的事情。

假设我有一个带有16,000个文件的目录。当前，我单独打开每个文件，将其放入Python的数组中，对数据进行一些操纵，然后输出到主阵列(长度为16,000(。可以使用多处理函数来运行"打开文件，处理数据和输出信息"的几个实例？

原始代码基本上是这样的：

# path
filepath = /path/to/file
# Get the dir contents
filedir = os.listdir(filepath)
# Pre-allocate large array
large_array = np.zeros(len(filedir))
# Begin loop
for i in range(0,len(filedir)):
    # Define the path to load the text file
    filename = filepath + '/' + filedir[i]
    output = []
    output = function_to_process_filename(filename)
    large_array[i] = output

多处理/并行部分将在何处使代码运行速度更快，而Python中它的样子？

您可以使用多处理池将作品提交到流程池中。

map函数将带有一个疑问并将其分成可以应用您的功能的作品的块(请参阅此处(：

此方法将价态置于多个块中作为单独的任务提交进程池。(大约(大小这些块可以通过将块设置为正面来指定整数。

在您的示例中，您可以将文件名称列表传递给map函数和将打开文件并操纵文件的函数。您可以作为结果传递处理的文件内容，并在主过程中加入所有内容。

因此，如果我正确理解，您正在寻找的是一种与多处理以及填充单个python数据结构的几个作业的方法？我将使用确实使用map，但也可以通过multiprocessing.Manager()来完成以前的好答案：

from multiprocessing import Pool, Manager, ctypes, cpu_count
from functools import partial
# path to dir
dir_path = /path/to/dir
# Get the dir content
files = os.listdir(dir_path)
def processing_func(results_array, filename):
    # process filename
    # add element to results_array
NB_CPU = cpu_count()
# change ctype with what the array will contain
results_array = Manager().array(ctypes.c_int, len(files))
with Pool(processes=NB_CPU) as pool:
    # this is used to pass args to multiprocessed function
    function_with_args = partial(processing_func, results_array)
        # this will iterate through the files and fill NB_CPU processes at a time
        # by applying the function_with_args on each iterated element
        pool.map(function_with_args, files)

相关内容

最新更新

热门标签：