我有一个Selenium解析器类:
class DynamicParser(Parser):
"""Selenium Parser with processing JS"""
driver: Chrome = None
def __init__(self, driver_path='./chromedriver', headless=True):
chrome_options = Options()
if headless:
chrome_options.add_argument("--headless")
chrome_options.add_argument("window-size=1920,1080")
# bypass OS security
chrome_options.add_argument('--no-sandbox')
# overcome limited resources
chrome_options.add_argument('--disable-dev-shm-usage')
# don't tell chrome that it is automated
chrome_options.add_experimental_option(
"excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
# disable images
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
# Setting Capabilities
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
self.driver = Chrome(chrome_options=chrome_options,
executable_path=driver_path, desired_capabilities=capabilities)
def goto(self, url: str):
"""Goes to specified URL"""
self.driver.get(url)
def get_seller_name(self) -> str:
"""Returns seller's name"""
offer_actions_tag = self.driver.find_element_by_class_name(
'offer-user__actions')
profile_link_tag = offer_actions_tag.find_element_by_tag_name('a')
return profile_link_tag.text.strip()
我还有一个测试脚本,它创建DynamicParser
,转到某个页面并调用.get_seller_name()
。
我注意到,当我无头运行Chromedriver时,它运行得慢得多,所以我用time python3 test.py
测试了它。
无头铬输出:
python3 test.py 2,98s user 0,94s system 3% cpu 2:04,65 total
非无头铬输出:
python3 test.py 1,48s user 0,33s system 47% cpu 3,790 total
正如我们所看到的,无头铬的运行速度几乎慢了33倍!
Chrome版本:83.0.4103.116
Chromedriver版本:83.0.4103.39
我真的不明白问题出在哪里。当我开发我以前的应用程序时,headless chrome运行得足够快。
刚刚发现问题。是
chrome_options.add_argument('--disable-dev-shm-usage')
我认为它应该有不受限制的chrome资源,但在这种情况下它肯定不起作用。
运行headless驱动程序时,您还可以使用这些设置来提高性能。
browser_options = webdriver.ChromeOptions()
browser_options.headless = True
image_preferences = {"profile.managed_default_content_settings.images": 2}
browser_options.add_experimental_option("prefs", image_preferences)
我发现这些都不适合我。
然而,将options.add_argument('--headless')
更改为options.add_argument('--headless=new')
产生了巨大的差异,似乎已经完全解决了这个问题。