在progressbar的计算中实现python多线程



我正在尝试创建一个类似tqdm的进度条。一切都很好,但我注意到progressbar的每一步的计算(对于大的可迭代项,len>50(都需要很多时间。这是我的密码。

def progressbar(iterable):
def new(index):
#... print the progressbar
for i in range(len(iterable)):
new(i)
yield iterable[i]

问题是,虽然在小型可迭代对象上,new()执行所需的时间无关紧要,但在大型可迭代对象中,它会成为一个问题(tqdm库中不会出现这种情况(。例如,以下代码需要几秒钟才能执行。它应该是即时的!

iterator = progressbar(range(1000))
for i in iterator: pass

你能告诉我补救这件事的方法吗?也许实现了多线程?

问题是什么还不清楚(您没有显示所有的计算(,但我相信您的方法可以通过进度条处理iterable的方式来改进:

  1. 首先,假设iterable是可索引的,但情况可能并不总是如此
  2. 如果它是一个生成器函数,那么长度可能无法用len函数来确定,将生成器转换为列表以获得其长度也不一定有效,并且它可能会破坏具有进度条的目的,如下面的示例所示。因此,您的界面应该允许用户传递一个可选的total参数(就像tqdm所做的那样(,以明确指定iterable的长度
  3. 您可以在函数new之外进行一些前期计算,以便new可以根据index参数的值快速计算条形图的宽度

我建议进行以下更改:

def progressbar(iterable, total=None):
def new(index):
#... print the progressbar
from math import floor
nonlocal division, width
n_division = floor(index / division + .5)
remainder = width - n_division
print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='r')
if total is None:
iterable = list(iterable)
# we must convert to a list
total = len(iterable)
it = iter(iterable)
width = 60 # with of progress bar
division = total / 60 # each division represents this many completions
try:
for i in range(total):
# ensure next value exists before printing it:
yield next(it)
new(i)
except StopIteration:
pass
print()
def fun():
import time
for i in range(1000):
time.sleep(.03)
yield i
iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])

多线程

将多线程作为加速处理的一种方式是有问题的。以下是一个(幼稚的(尝试,但失败了,因为尽管使用多线程从生成器函数fun获取值,生成器函数仍然每.03秒生成一次值。还应该清楚的是,例如,如果iterable是一个简单的列表,那么多线程将无法比使用单个线程更快地迭代列表:

from multiprocessing.pool import ThreadPool

def progressbar(iterable, total=None):
def new(index):
#... print the progressbar
from math import floor
nonlocal division, width
n_division = floor(index / division + .5)
remainder = width - n_division
print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='r')
if total is None:
iterable = list(iterable)
# we must convert to a list
total = len(iterable)
it = iter(iterable)
width = 60 # with of progress bar
division = total / 60 # each division represents this many completions

with ThreadPool(20) as pool:
for i, result in enumerate(pool.imap(lambda x: x, iterable)):
yield result
new(i)
print()
def fun():
import time
for i in range(1000):
time.sleep(.03)
yield i
iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])

如果生成器函数本身使用了多线程,则会加快处理速度。但是,当然,人们无法控制iterable是如何创建的:

from multiprocessing.pool import ThreadPool

def progressbar(iterable, total=None):
def new(index):
#... print the progressbar
from math import floor
nonlocal division, width
n_division = floor(index / division + .5)
remainder = width - n_division
print('|', '.' * n_division, ' ' * remainder, '|', sep='', end='r')
if total is None:
iterable = list(iterable)
# we must convert to a list
total = len(iterable)
it = iter(iterable)
width = 60 # with of progress bar
division = total / 60 # each division represents this many completions
try:
for i in range(total):
# ensure next value exists before printing it:
yield next(it)
new(i)
except StopIteration:
pass
print()

def fun():
import time
def fun2(i):
time.sleep(.03)
return i
with ThreadPool(20) as pool:
for i in pool.imap(fun2, range(1000)):
yield i
iterator = progressbar(fun(), total=1000)
values = [i for i in iterator]
print(values[0], values[-1])

最新更新