如何在 Python 中并行处理列表?

我写了这样的代码：

def process(data):
#create file using data
all = ["data1", "data2", "data3"]

我想在我的所有列表中并行执行进程功能，因为它们正在创建小文件，所以我不担心磁盘写入，但处理需要很长时间，所以我想使用我所有的内核。

如何使用 python 2.7 中的默认模块执行此操作？

假设CPython和GIL在这里。

如果任务受 I/O 限制，则通常，线程处理可能更有效，因为线程只是在操作系统上转储工作并空闲，直到 I/O 操作完成。生成过程是照看 I/O 的繁重方式。

但是，大多数文件系统不是并发的，因此使用多线程或多处理可能不会比同步写入快。

尽管如此，这里有一个人为的multiprocessing.Pool.map示例，它可能有助于您的 CPU 密集型工作：

from multiprocessing import cpu_count, Pool
def process(data):
# best to do heavy CPU-bound work here...
# file write for demonstration
with open("%s.txt" % data, "w") as f:
f.write(data)
# example of returning a result to the map
return data.upper()

tasks = ["data1", "data2", "data3"]
pool = Pool(cpu_count() - 1)
print(pool.map(process, tasks))

类似的线程设置可以在concurrent.futures.ThreadPoolExecutor中找到。

顺便说一句，all是一个内置函数，不是一个很好的变量名称选择。

或者：

from threading import Thread
def process(data):
print("processing {}".format(data))
l= ["data1", "data2", "data3"]
for task in l:
t = Thread(target=process, args=(task,))
t.start()

或者(只有 python 版本> 3.6.0(：

from threading import Thread
def process(data):
print(f"processing {data}")
l= ["data1", "data2", "data3"]
for task in l:
t = Thread(target=process, args=(task,))
t.start()

有一个使用multiprocessing的模板，希望对您有所帮助。

from multiprocessing.dummy import Pool as ThreadPool
def process(data):
print("processing {}".format(data))
alldata = ["data1", "data2", "data3"]
pool = ThreadPool()
results = pool.map(process, alldata)
pool.close()
pool.join()

相关内容

最新更新

热门标签：