我在python中的线程处理方面缺乏经验,并试图制作一些简单的多线程程序来获得更多的经验。我正在尝试将请求发送到预定义的URL列表中。
当试图执行程序时,它会立即完成并打印("结束"(,没有退出或异常。放置在threadfunction中的print调用不会执行,也不会引发任何错误。
如有任何帮助,我们将不胜感激。
import networking
import threading
import concurrent.futures
class concurrencyTest:
def __init__(self, URLlist):
self.URLlist = URLlist
self.resourceDict = {}
self._urlListLock = threading.Lock()
self._resourceListLock = threading.Lock()
def sendMultiThreadedRequests(self, threadNum=3):
self.resourceDict = {}
with concurrent.futures.ThreadPoolExecutor(max_workers=threadNum) as executor:
results = executor.map(self.thread_function)
def thread_function(self):
print("You are are in the thread_function")
while True:
with self._urlListLock:
numOfRemainingURL = len(self.URLlist)
print(numOfRemainingURL)
if numOfRemainingURL == 0:
return
urlToRequest = self.URLlist.pop()
webpage = networking.getWebpage(urlToRequest)
##parse webpage or resource
with self._resourceListLock:
self.resourceDict[urlToRequest] = webpage
def sendRegularRequests(self):
self.resourceDict = {}
for url in self.URLlist:
resource = networking.getWebpage(url)
self.resourceDict[url] = resource
def updateURLpool(self):
return "Not currently coded"
def main():
#The real urlList is a lot larger than just 3 URLs
urlList = ["www.google.com","www.stackoverflow.com","www.reddit.com"]
parTest = concurrencyTest(urlList)
parTest.sendMultiThreadedRequests()
print("End")
main()
executor.map()
用于将值列表映射到函数调用,并期望一个可迭代的(例如列表(作为第二个参数(或多个对象作为独立参数(将其内容映射到作为第一个参数提供的函数。
例如:
executor.map(self.thread_function, self.URLlist)
或
executor.map(self.thread_function, url1, url2, url3, ..., urln)
将为CCD_ 3中的每个值或第二示例中提供的每个参数调用CCD_。
这反过来意味着,函数thread_function()
需要接受一个参数才能从列表中获得值:thread_function(self, url)
。由于该函数现在一次只能获得URLlist
的一个值,因此函数中的while循环不再有意义,您必须重构该函数以仅处理一个url而不是一个列表:
def thread_function(self, url):
webpage = getWebpage(url)
# parse webpage or resource
with self._resourceListLock:
self.resourceDict[url] = webpage
或者,您可以使用submit()
而不是map()
,其目的只是异步执行一个函数。这样就不需要对thread_function()
进行修改:
executor.submit(self.thread_function)
如果要使用concurrent.futures
您从不向.map()
传递任何可迭代项,因此不会执行任何操作。为了简化你的东西(你也不需要任何锁(:
import concurrent.futures
import random
import time
import hashlib
def get_data(url):
print(f"Starting to get {url}")
# to pretend doing some work:
time.sleep(random.uniform(0.5, 1))
result = hashlib.sha1(url.encode("utf-8")).hexdigest()
print(f"OK: {url}")
return (url, result)
url_list = ["www.google.com", "www.stackoverflow.com", "www.reddit.com"]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
results = {}
for key, value in executor.map(get_data, url_list):
results[key] = value
print(f"Results acquired: {len(results)}")
# or more simply
# results = dict(executor.map(get_data, url_list))
print(results)
打印出(例如,它是随机的(
Starting to get www.google.com
Starting to get www.stackoverflow.com
Starting to get www.reddit.com
OK: www.google.com
Results acquired: 1
OK: www.stackoverflow.com
Results acquired: 2
OK: www.reddit.com
Results acquired: 3
{'www.google.com': 'd8b99f68b208b5453b391cb0c6c3d6a9824f3c3a', 'www.stackoverflow.com': '3954ca3139369180fff4ea3ae984b9a7871b540d', 'www.reddit.com': 'f420470addba27b8577bb40e02229e90af568d69'}
如果要使用multiprocessing
(与上述get_data
功能相同(
from multiprocessing.pool import ThreadPool, Pool
# (choose between threads or processes)
with ThreadPool(3) as p:
results = dict(p.imap_unordered(get_data, url_list))
print(results)