我在Python抓取程序中使用多处理来优化速度。然而,这个程序运行需要30秒,所以我看了看"引擎盖下",发现实际过程只需要4.5秒。所以从1个核心到5个核心的转换需要25秒?
mp_start_time = time.time()
with mp.Pool() as pool:
output = pool.map(
self.parallel_process,
[(
link,
question,
25,
crawler.webcrawler,
self.translator,
) for link in links]
)
print('Multiprocesses time:', time.time() - mp_start_time)
多进程功能
def parallel_process(self, inputs):
# Inputs: Must be a dict of all inputs cuz multiprocessing only allows one input
mli_start_time = time.time()
# Variables
link = inputs[0]
question = inputs[1]
n_sentences = inputs[2]
# Modules
webcrawler = inputs[3]
translator = inputs[4]
question_answering = inputs[5]
print('ml init time',time.time() - mli_start_time)
# Crawl the website
# Question is used to rank the sentences
wc_start_time = time.time()
webdata = webcrawler(link, question, n_sentences)
print('crawler time:', time.time() - wc_start_time)
# Translate into English so our ML engine understands the text
# Automatically detect language and translates into english
webdata_english = translator.translate(webdata, dest='en').text
return webdata_english
事实上是这样吗?还是发生了其他事情?我该如何解决?
我没有解决这个问题,但原因可能是我传递的一个参数太大了。。。该论点是ML预测类。所以我最终只做了一个线性的循环标准。