由于某些参数缺少,我是否被网站阻止了?(用硒刮擦)



我要刮擦的页面(用于练习(是以下URL。我正在尝试在页面底部刮擦收入(图表(

import time
from PIL import Image
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
browser.quit()
browser = webdriver.PhantomJS()
browser.implicitly_wait(12)
url = 'https://seekingalpha.com/symbol/OPK/financials/income-statement'

browser.get(url)
time.sleep(9)
#x =browser.find_element_by_class_name('content')
y =browser.find_element_by_xpath("//*[@id='industrial-income-statement']")

此代码在不久前就可以使用。现在,我" m fer"无此类元素"错误,对于此行y =browser.find_element_by_xpath("//*[@id='industrial-income-statement']")

如果我键入 browser.page_source

有一些关于访问被拒绝的东西,但我不确定为什么。我只是想刮擦一张图表,并且我正在使用硒,我以为具有适当的标题。

'0px 25px; padding: 0px; resize: none; "></textarea></div></div></div>n <p>n Access to this page has been denied because we believe you are using automation tools to browse the website.n </p>n <p>n This may happen as a result of the following:n </p>n <ul>n <li>n Javascript is disabled or blocked by an extension (ad blockers for example)n </li>n <li>n Your browser does not support cookiesn </li>n </ul>n <p>n Please make sure that Javascript and cookies are enabled on your browser and that you are not blocking them from loading.n </p>n <p>n Reference ID: #a2a7fe90-4a2a-11e7-be16-a994e7f2d3b8n </p>n </div>n </div>n <div class="page-footer-wrapper">n <div class="page-foote

据我所知,

phantomjs不会阻止JavaScript,也不会阻止Cookie。

有办法解决吗?

您应该假装不为PhantomJS以避免被检测到:

capabilities = dict(webdriver.DesiredCapabilities.PHANTOMJS)
capabilities["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
browser = webdriver.PhantomJS(desired_capabilities=capabilities)

我会在未经明确同意的情况下刮擦此资源时要谨慎 - 查看使用条款 - "用户行为"部分。

最新更新