如何一次处理多个列表?



我有一个很大的数字列表。我想将这个大数字列表拆分为 x 个列表并并行处理它们。

这是我到目前为止的代码:

from multiprocessing import Pool
import numpy
def processNumList(numList):
for num in numList:
outputList.append(num ** 2)
numThreads = 5
bigNumList = list(range(50))
splitNumLists = numpy.array_split(bigNumList, numThreads)
outputList = []
for numList in splitNumLists:
processNumList(numList)
print(outputList)

上面的代码执行以下操作:

  • 将一个大数字列表拆分为指定数量的较小列表
  • 将每个列表传递给 processNumList 函数
  • 之后打印结果列表

那里的所有内容都按预期工作,但它一次只处理一个列表。我希望同时处理每个列表。

执行此操作的正确代码是什么?我尝试了pool但似乎永远无法让它工作。

你可以尝试这样的事情:

import threading
class MyClass(threading.Thread):
def __init__(self):
# init stuff
def run(self, arg, arg2):
# your logic to process the list
# split the list as you already did
for _ in range(numThreads):
MyThread(arg, arg2).start()

这是我最终使用的代码。

我使用threading.Thread()异步处理列表,然后调用thread.join()以确保在继续之前完成所有线程。

我添加了time.sleep用于演示目的(模拟冗长的任务(,但显然您不想在生产代码中使用它。

import numpy
import threading
import time
def process_num_list(numList):
for num in numList:
output_list.append(num ** 2)
time.sleep(1)
num_threads = 5
big_num_list = list(range(30))
split_num_lists = numpy.array_split(big_num_list, num_threads)
output_list = []
threads = []
for num_list in split_num_lists:
thread = threading.Thread(target=process_num_list, args=[num_list])
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print(output_list)

 


作为奖励,这里有一个五个硒窗口的工作示例:

from selenium import webdriver
import numpy
import threading
import time
def scrapeSites(siteList):
print("Preparing to scrape " + str(len(siteList)) + " sites")
driver = webdriver.Chrome(executable_path = r"..chromedriver.exe")
driver.set_window_size(700, 400)
for site in siteList:
print("nNow scraping " + site)
driver.get(site)
pageTitles.append(driver.title)
driver.quit()
numThreads = 5
fullWebsiteList = ["https://en.wikipedia.org/wiki/Special:Random"] * 30
splitWebsiteLists = numpy.array_split(fullWebsiteList, numThreads)
pageTitles = []
threads = []
for websiteList in splitWebsiteLists:
thread = threading.Thread(target=scrapeSites, args=[websiteList])
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print(pageTitles)

相关内容

  • 没有找到相关文章