如何关闭/退出由多处理产生的SeleniumChrome驱动程序.游泳池



我有一个文章标题和id的列表,用于生成文章的URL和抓取内容。我正在使用多处理。并行化工作的池。这是我的代码:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from article import Article
from signal import signal, SIGTERM
import multiprocessing as mp
import sys
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = '*path*chrome.exe'    
driver = webdriver.Chrome(executable_path="chromedriver", chrome_options=chrome_options)

def get_article(args):
title, id, q = args
article = Article.from_url('https://*url*/article/{}'.format(id), driver, title=title, id=id)
print('parsed article: ', title)
q.put(article.to_json())

def file_writer(q):
with open('data/articles.json', 'w+') as file:
while True:
line = q.get()
if line == 'END':
break
file.write(line + 'n')
file.flush()

if __name__ == '__main__':
manager = mp.Manager()
queue = manager.Queue()
pool_size = mp.cpu_count() - 2
pool = mp.Pool(pool_size)
writer = mp.Process(target=file_writer, args=(queue,))
writer.start()
with open('data/article_list.csv', 'r') as article_list:
article_list_with_queue = [(*line.split('|'), queue) for line in article_list]
pool.map(get_article, article_list_with_queue)
queue.put('END')
pool.close()
pool.join()
driver.close()

代码执行得很好,但在它完成后,我在PyCharm.exe中有大约80个子进程。大多数是chrome.exe,一些是chromediver.exe。

我试着放

signal(SIGTERM, terminate)

在worker函数中,并退出terminate((中的驱动程序,但这不起作用。

您可以创建用于终止所有进程的.bat文件:

@echo off
rem   just kills stray local chromedriver.exe instances.
rem   useful if you are trying to clean your project, and your ide is complaining.
taskkill /im chromedriver.exe /f

并在所有测试后运行

相关内容

  • 没有找到相关文章

最新更新