Python Selenium无法读取JS内容?



请注意,这个问题仍然是开放的,作为建议的"答案"。仍然给出相同的输出,因为它没有解释为什么JS不在该页上运行或为什么selenium不能提取它

我正在尝试阅读页面来源:http://147.235.97.36/(Hp打印机),这是由JS渲染。

所以我写:

driver.get(url)
wait_for_page(driver)
source = driver.page_source
print(source)

但是在打印的源代码中我看到:

<p>JavaScript is required to access this website.</p>
<p>Please enable JavaScript or use a browser that supports JavaScript.</p>

和一些内容不在那里,所以我将代码更改为:

driver.get(url)
wait_for_page(driver)
source = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print(source)

还是一样的输出,你能帮我理解这里的问题是什么吗?

这是我的init_driver函数:

def init_driver():
# --Initialize Driver--#
chrome_options = Options()
chrome_options.add_argument("--headless")  # Run in Background
chrome_options.add_argument('--disable-gpu') if os.name == 'nt' else None  # Windows workaround
prefs = {"profile.default_content_settings.images": 2,
"profile.managed_default_content_settings.images": 2}  # Disable Loading of Images
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument('--ignore-ssl-errors=yes')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument("--window-size=1920,1080")  # Standard Window Size
chrome_options.add_argument("--pageLoadStrategy=normal")
driver = None
try:
driver = webdriver.Chrome(options=chrome_options, service=Service('./chromedriver'))
driver.set_page_load_timeout(REQUEST_TIMEOUT)
except Exception as e:
log_warning(str(e))
return driver

您可以添加一些参数以避免被检测到并打印页面源,如下所示:

  • 代码块:

    options = Options()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    s = Service('C:\BrowserDrivers\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    driver.get("http://147.235.97.36/")
    print(driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML"))
    
  • 控制台输出
  • :

    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link href="/framework/Unified.css" rel="stylesheet" type="text/css">
    <script type="text/javascript">
    frameWorkObj = {};
    frameWorkObj.pkg = "ews";
    </script>
    <script src="/framework/Unified.js" type="text/javascript"></script>
    </head>
    <body class="theme-gray">
    <iframe src="/framework/cookie/client/cookie.html" style="display: none;"></iframe>
    <div id="pgm-overall-container">
    <div id="pgm-left-pane-bkground"></div>
    <div id="pgm-banner"></div>
    <div id="pgm-search-div" class="gui-hidden"></div>
    <div id="pgm-top-pane"></div>
    <div id="pgm-container-div">
    <div id="pgm-left-pane"></div>
    <div id="pgm-container" class="clear-fix">
    <div id="pgm-title-div" class="gui-hidden"></div>
    <div id="contentPane" class="contentPane"></div>
    </div>
    </div>
    <div id="pgm-footer"></div>
    </div> <!-- #pgm-overall-container -->
    <div id="pgm-theatre-staging-div"></div>
    <script type="text/javascript">
    // frame buster
    if(top != self)
    top.location.replace(self.location.href);
    </script>
    <noscript>
    <div id="pgm-no-js-text">
    <p>JavaScript is required to access this website.</p>
    <p>Please enable JavaScript or use a browser that supports JavaScript.</p>
    </div>
    </noscript>
    
    <div id="ui-datepicker-div" style="display: none;" tabindex="0"></div></body>
    

最新更新