请注意,这个问题仍然是开放的,作为建议的"答案"。仍然给出相同的输出,因为它没有解释为什么JS不在该页上运行或为什么selenium不能提取它
我正在尝试阅读页面来源:http://147.235.97.36/(Hp打印机),这是由JS渲染。
所以我写:
driver.get(url)
wait_for_page(driver)
source = driver.page_source
print(source)
但是在打印的源代码中我看到:
<p>JavaScript is required to access this website.</p>
<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
和一些内容不在那里,所以我将代码更改为:
driver.get(url)
wait_for_page(driver)
source = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print(source)
还是一样的输出,你能帮我理解这里的问题是什么吗?
这是我的init_driver
函数:
def init_driver():
# --Initialize Driver--#
chrome_options = Options()
chrome_options.add_argument("--headless") # Run in Background
chrome_options.add_argument('--disable-gpu') if os.name == 'nt' else None # Windows workaround
prefs = {"profile.default_content_settings.images": 2,
"profile.managed_default_content_settings.images": 2} # Disable Loading of Images
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument('--ignore-ssl-errors=yes')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument("--window-size=1920,1080") # Standard Window Size
chrome_options.add_argument("--pageLoadStrategy=normal")
driver = None
try:
driver = webdriver.Chrome(options=chrome_options, service=Service('./chromedriver'))
driver.set_page_load_timeout(REQUEST_TIMEOUT)
except Exception as e:
log_warning(str(e))
return driver
您可以添加一些参数以避免被检测到并打印页面源,如下所示:
-
代码块:
options = Options() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('excludeSwitches', ['enable-logging']) options.add_experimental_option('useAutomationExtension', False) options.add_argument('--disable-blink-features=AutomationControlled') s = Service('C:\BrowserDrivers\chromedriver.exe') driver = webdriver.Chrome(service=s, options=options) driver.get("http://147.235.97.36/") print(driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML"))
控制台输出 :
<head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link href="/framework/Unified.css" rel="stylesheet" type="text/css"> <script type="text/javascript"> frameWorkObj = {}; frameWorkObj.pkg = "ews"; </script> <script src="/framework/Unified.js" type="text/javascript"></script> </head> <body class="theme-gray"> <iframe src="/framework/cookie/client/cookie.html" style="display: none;"></iframe> <div id="pgm-overall-container"> <div id="pgm-left-pane-bkground"></div> <div id="pgm-banner"></div> <div id="pgm-search-div" class="gui-hidden"></div> <div id="pgm-top-pane"></div> <div id="pgm-container-div"> <div id="pgm-left-pane"></div> <div id="pgm-container" class="clear-fix"> <div id="pgm-title-div" class="gui-hidden"></div> <div id="contentPane" class="contentPane"></div> </div> </div> <div id="pgm-footer"></div> </div> <!-- #pgm-overall-container --> <div id="pgm-theatre-staging-div"></div> <script type="text/javascript"> // frame buster if(top != self) top.location.replace(self.location.href); </script> <noscript> <div id="pgm-no-js-text"> <p>JavaScript is required to access this website.</p> <p>Please enable JavaScript or use a browser that supports JavaScript.</p> </div> </noscript> <div id="ui-datepicker-div" style="display: none;" tabindex="0"></div></body>