我有一个很大的数字列表。我想将这个大数字列表拆分为 x 个列表并并行处理它们。
这是我到目前为止的代码:
from multiprocessing import Pool
import numpy
def processNumList(numList):
for num in numList:
outputList.append(num ** 2)
numThreads = 5
bigNumList = list(range(50))
splitNumLists = numpy.array_split(bigNumList, numThreads)
outputList = []
for numList in splitNumLists:
processNumList(numList)
print(outputList)
上面的代码执行以下操作:
- 将一个大数字列表拆分为指定数量的较小列表
- 将每个列表传递给 processNumList 函数
- 之后打印结果列表
那里的所有内容都按预期工作,但它一次只处理一个列表。我希望同时处理每个列表。
执行此操作的正确代码是什么?我尝试了pool
但似乎永远无法让它工作。
你可以尝试这样的事情:
import threading
class MyClass(threading.Thread):
def __init__(self):
# init stuff
def run(self, arg, arg2):
# your logic to process the list
# split the list as you already did
for _ in range(numThreads):
MyThread(arg, arg2).start()
这是我最终使用的代码。
我使用threading.Thread()
异步处理列表,然后调用thread.join()
以确保在继续之前完成所有线程。
我添加了time.sleep
用于演示目的(模拟冗长的任务(,但显然您不想在生产代码中使用它。
import numpy
import threading
import time
def process_num_list(numList):
for num in numList:
output_list.append(num ** 2)
time.sleep(1)
num_threads = 5
big_num_list = list(range(30))
split_num_lists = numpy.array_split(big_num_list, num_threads)
output_list = []
threads = []
for num_list in split_num_lists:
thread = threading.Thread(target=process_num_list, args=[num_list])
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print(output_list)
作为奖励,这里有一个五个硒窗口的工作示例:
from selenium import webdriver
import numpy
import threading
import time
def scrapeSites(siteList):
print("Preparing to scrape " + str(len(siteList)) + " sites")
driver = webdriver.Chrome(executable_path = r"..chromedriver.exe")
driver.set_window_size(700, 400)
for site in siteList:
print("nNow scraping " + site)
driver.get(site)
pageTitles.append(driver.title)
driver.quit()
numThreads = 5
fullWebsiteList = ["https://en.wikipedia.org/wiki/Special:Random"] * 30
splitWebsiteLists = numpy.array_split(fullWebsiteList, numThreads)
pageTitles = []
threads = []
for websiteList in splitWebsiteLists:
thread = threading.Thread(target=scrapeSites, args=[websiteList])
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print(pageTitles)