我正在尝试学习多处理库。这样它就可以工作:
def generate_files(file_number, directory):
var02 = str(int(100*random.random()))
with open(f"{directory}/sample{file_number}.csv", "w") as f:
f.write(var02)
if __name__ == "__main__":
N1 = 100
# create directory for samples
directory = "samples"
if not os.path.exists(directory):
os.makedirs(directory)
cpu_count = int(os.environ["NUMBER_OF_PROCESSORS"]) # doesn't work on mac
# generate using all cores
for i in range(N1):
process = multiprocessing.Process(target=generate_files, args=[i, directory])
process.start()
不好的是程序创建了 100 个进程。我想将它们限制为cpu_count
.所以它应该看起来像这样:
for i in range(cpu_count):
process = multiprocessing.Process(target=generate_files, args=[i, directory, cpu_count])
但是这样,所有进程都尝试写入同一个文件,因为名称是相同的。如果文件数量不是内核的倍数,这也不完美。有什么办法吗?
您可以将multiprocessing
的池与map
函数一起使用。您可以指定要在池中使用多少个进程(错误为os.cpu_count
(,然后在可迭代对象的每个元素上运行一个函数。因此,在您的情况下,您可以执行以下操作:
from multiprocessing import Pool
import os
import random
# create directory for samples
directory = "samples"
def generate_files(file_number):
var02 = str(int(100*random.random()))
with open(f"{directory}/sample{file_number}.csv", "w") as f:
f.write(var02)
if __name__ == "__main__":
N1 = 100
if not os.path.exists(directory):
os.makedirs(directory)
with Pool() as pool:
pool.map(generate_files, range(N1))
我明白了。首先,我需要创建一个文件名列表,并完成工作,因为它是空的。
def generate_files(N2, file_numbers, directory):
while file_numbers:
file_number = file_numbers.pop()
var02 = str(int(100*random.random()))
with open(f"{directory}/sample{file_number}.csv", "w") as f:
f.write(var02)
if __name__ == "__main__":
N1 = 100
nums = [e for e in range(100)]
# create directory for samples
directory = "samples"
if not os.path.exists(directory):
os.makedirs(directory)
cpu_count = int(os.environ["NUMBER_OF_PROCESSORS"]) # doesn't work on mac
# generate using all cores
for i in range(cpu_count):
process = multiprocessing.Process(target=generate_files, args=[N2, nums, directory])
process.start()