使用硒网格码头工人集群进行网络抓取



我正在研究硒网格码头工人来抓取网站。如果我只使用一个铬节点,则意味着如果我缩放多个铬硒网格节点并且刮擦再次停止工作,则硒网格正在工作。一段时间后,它只是闪烁并显示大错误消息。

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import scrapy
from selenium import webdriver
class ProductSpider(scrapy.Spider):
name = "product_spider"
start_urls = ['https://google.com']
def __init__(self):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
self.driver = webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME)

def parse(self, response):
data = self.driver.get(response.url)
print(data,'/////////////')

然后我打开python shell并输入代码

Python 3.6.5 (default, Apr  1 2018, 05:46:30) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selenium import webdriver
>>> from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
>>> options = webdriver.ChromeOptions()
>>> options.add_argument('--headless')
>>> driver = webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
...             desired_capabilities=DesiredCapabilities.CHROME)

如您所见,它停止在网络驱动程序中。远程.cursor只是闪烁了很长时间,然后显示大错误消息。我认为问题出在网络驱动程序中。Remote(command_executor='http://localhost:5000/wd/hub', ... desired_capabilities=DesiredCapabilities.CHROME(行。

任何人都可以为这个问题提供解决方案 请注意,如果我缩放多个节点(铬(,如果硒网格有一个节点(铬(,它就可以工作。

这是长时间后的错误消息:

回溯(最近一次调用(:文件 ",第 1 行,在 文件 "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", 第 156 行,在initself.start_session(capabilities, browser_profile( 文件 "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py"中, 251路,start_session response = self.execute(Command.NEW_SESSION, parameters( 文件 "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", 第 320 行,执行中 self.error_handler.check_response(response( 文件 "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", 242号线,check_response 引发exception_class(消息、屏幕、堆栈跟踪( selenium.common.exceptions.WebDriver异常:消息:错误 转发新会话 转发请求时出错 连接到 172.18.0.8:5555 [/172.18.0.8] 失败:连接超时(连接超时( 堆栈跟踪: at org.openqa.grid.web.servlet.handler.RequestHandler.process (RequestHandler.java:117( at org.openqa.grid.web.servlet.DriverServlet.process (DriverServlet.java:84( at org.openqa.grid.web.servlet.DriverServlet.doPost (DriverServlet.java:68( at javax.servlet.http.HttpServlet.service (HttpServlet.java:707( at javax.servlet.http.HttpServlet.service (HttpServlet.java:790( at org.seleniumhq.jetty9.servlet.ServletHolder.handle (ServletHolder.java:860( at org.seleniumhq.jetty9.servlet.ServletHandler.doHandle (ServletHandler.java:535( at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextHandle (ScopedHandler.java:188( at org.seleniumhq.jetty9.server.session.SessionHandler.doHandle (SessionHandler.java:1595( at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextHandle (ScopedHandler.java:188( at org.seleniumhq.jetty9.server.handler.ContextHandler.doHandle (ContextHandler.java:1253( at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextScope (ScopedHandler.java:168( at org.seleniumhq.jetty9.servlet.ServletHandler.doScope (ServletHandler.java:473( at org.seleniumhq.jetty9.server.session.SessionHandler.doScope (SessionHandler.java:1564( at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextScope (ScopedHandler.java:166( at org.seleniumhq.jetty9.server.handler.ContextHandler.doScope (ContextHandler.java:1155( at org.seleniumhq.jetty9.server.handler.ScopedHandler.handle (ScopedHandler.java:141( at org.seleniumhq.jetty9.server.handler.HandlerWrapper.handle (HandlerWrapper.java:132( at org.seleniumhq.jetty9.server.Server.handle (Server.java:530( at org.seleniumhq.jetty9.server.HttpChannel.handle (HttpChannel.java:347( at org.seleniumhq.jetty9.server.HttpConnection.onFillable (HttpConnection.java:256( at org.seleniumhq.jetty9.io.AbstractConnection$ReadCallback.success (摘要连接.java:279( at org.seleniumhq.jetty9.io.FillInterest.fillable (FillInterest.java:102( at org.seleniumhq.jetty9.io.ChannelEndPoint$2.run (ChannelEndPoint.java:124( at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.doProduce (吃什么你杀了.java:247( at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.produce (吃什么你杀了.java:140( at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.run (EatWhatYouKill.java:131( at org.seleniumhq.jetty9.util.thread.ReservedThreadExecutor$ReservedThread.run (保留线程执行器.java:382( at org.seleniumhq.jetty9.util.thread.QueuedThreadPool.runJob (QueuedThreadPool.java:708( at org.seleniumhq.jetty9.util.thread.QueuedThreadPool$2.run (QueuedThreadPool.java:626(

我还在使用多个节点时附加了硒网格控制台屏幕截图。 链接此处查看图片

看起来你正在用Firefox启动新的Selenium节点,但你的测试专门寻找Chrome。

我建议使用Zalenium来设置Selenium Grid: https://github.com/zalando/zalenium

相关内容

  • 没有找到相关文章

最新更新