Python Selenium:关闭所有webdriver实例



我正在进行这个浏览器自动化项目,该项目并行执行一些浏览器任务。想法是:

  • 打开四个浏览器
  • 做一些任务
  • 等待所有浏览器完成任务,然后关闭所有浏览器

这里有一个简单的web驱动程序功能,用于演示。

# For initializing webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
def initialize_driver(starting_url: str = 'https://www.google.com/'):
''' Open a webdriver and go to Google
'''
# Webdriver option(s): keep webdriver opened
chrome_options = Options()
chrome_options.add_experimental_option("detach", True) 
# Initialize webdriver
driver = webdriver.Chrome(
service=Service(ChromeDriverManager().install()), 
options=chrome_options)

# Open website; wait until fully loaded
driver.get(starting_url)
driver.implicitly_wait(10)
time.sleep(1)
return driver

使用这个函数,我现在可以创建四个使用multiprocessing并行运行的作业。

# Import package
import multiprocessing as mp
# List of workers
workers = []
# Run in parallel
for _ in range(4):
worker = mp.Process(target=phm2.worker_bot_test)
worker.start()
workers.append(worker)
for worker in workers:
worker.join()

这些已经涵盖了前两点,但据我所知,我们一次只能使用driver.close()关闭一个网络驱动程序。有没有办法让我们一次把它们全部关闭?实际上,我尝试创建一个网络驱动程序列表,并在函数末尾添加一个网络驱动器。然后,一个接一个地关闭它们。但由于某种原因,它不起作用。

# I added drivers.append(driver) at the end of the function from earlier
# This will now be a global variable to store the list of drivers
drivers = []
# Insert multiprocessing code here...
# Close all drivers
for driver in drivers:
driver.close()

我可以尝试做些什么来完成最后一步?我已经看到我们可以调整Process类以包含返回值(有返回值会有很大帮助(,但是,尽可能地,我不想这样做,因为它有点复杂。

每个webdriver对象都是绝对独立的对象实例
与将f.e.get()方法应用于某个特定的webdriver对象时一样,这对任何其他webdriver对象都没有影响,类似地,当将quit()close()应用于某一webdriver对象时,这对其他任何webdriver对象都绝对没有影响
因此,关闭所有webdriver会话的唯一方法是将所有webdriver对象保留在某种结构中,如list等。
当您需要关闭所有会话时,请遍历该列表,并将driver.quit()应用于该列表中的每个对象
BTW,为了清楚地关闭会话,您应该使用quit()方法,而不是close()

我首先会注意到,由于selenium驱动程序已经作为子进程运行,您只需要真正使用多线程。我假设您的线程在检索到网页及其元素后所做的任何工作都不是特别占用CPU。如果不是这种情况,您可以始终创建一个多处理池,该池将传递给worker_bot_test工作函数,用于并行执行任何CPU密集型操作。

通过使用线程,我们可以创建一个创建驱动程序并具有CCD_ 19终结器的类;退出";类实例被垃圾回收时的驱动程序。我们在线程本地存储中保留对该类实例的引用,以便只有在线程终止并且线程本地存储被垃圾回收时才调用终结器。为了确保这种垃圾收集,我们可以在子线程终止后显式调用gc.collect。如果我们使用多处理而不是多线程,那么对gc.collect的调用将无效,因为它只垃圾收集当前进程。

# For initializing webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
import threading
class ChromeDriver:
def __init__(self, starting_url):
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
# Not a bad option to add:
#chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
# If we don't need to see the browsers:
#chrome_options.add_argument("headless")
# Initialize webdriver
self.driver = webdriver.Chrome(
service=Service(ChromeDriverManager().install()),
options=chrome_options)
# Open website; wait until fully loaded
self.driver.get(starting_url)
self.driver.implicitly_wait(10)
# What is the purpose of the following line?
#time.sleep(1)
def __del__(self):
self.driver.quit() # clean up driver when we are cleaned up
print('The driver has been "quitted".')
threadLocal = threading.local()
def initialize_driver(starting_url: str = 'https://www.google.com/'):
chrome_driver =  ChromeDriver(starting_url)
# Make sure there is a reference to the ChromeDriver instance so that
# it is not prematurely finalized:
threadLocal.driver = chrome_driver
return chrome_driver.driver
def worker_bot_test():
driver = initialize_driver()
print(len(driver.page_source))

if __name__ == '__main__':
# List of workers
workers = []
# Run in parallel
for _ in range(4):
worker = threading.Thread(target=worker_bot_test)
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
# Ensure finalizers are executed:
import gc
gc.collect()

打印:

...
163036
163050
163183
165486
The driver has been "quitted".
The driver has been "quitted".
The driver has been "quitted".
The driver has been "quitted".