通过 docker 运行将参数传递给刮擦蜘蛛



我有一个Scrapy+Selenium蜘蛛包装在一个docker容器中。我想运行那个容器,将一些arus传递给蜘蛛。但是,由于某种原因,我收到一条奇怪的错误消息。在提交问题之前,我进行了广泛的搜索并尝试了许多不同的选项。

Dockerfile

FROM python:2.7
# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
# install xvfb
RUN apt-get install -yqq xvfb
# install pyvirtualdisplay
RUN pip install pyvirtualdisplay
# set display port and dbus env to avoid hanging
ENV DISPLAY=:99
ENV DBUS_SESSION_BUS_ADDRESS=/dev/null
#install scrapy
RUN pip install --upgrade pip && 
    pip install --upgrade 
        setuptools 
        wheel && 
    pip install --upgrade scrapy
# install selenium
RUN pip install selenium==3.8.0
# install xlrd
RUN pip install xlrd
# install bs4
RUN pip install beautifulsoup4
ADD . /tralala/
WORKDIR tralala/
CMD scrapy crawl personel_spider_mpc -a chunksNo=$chunksNo -a chunkI=$chunkI

我想问题可能出在CMD部分。

蜘蛛初始化部分:

class Crawler(scrapy.Spider):
    name = "personel_spider_mpc"
    allowed_domains = ['tralala.de',]
    def __init__(self, vdisplay = True, **kwargs):
        super(Crawler, self).__init__(**kwargs)
        self.chunkI = chunkI
        self.chunksNo = chunksNo

我如何运行容器:

docker run --env chunksNo='10' --env chunkI='1' ostapp/tralala

我尝试使用两个引号和没有引号

错误消息:

2018-04-04 16:42:32 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 98, in crawl
    six.reraise(*exc_info)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
    spider = cls(*args, **kwargs)
  File "/tralala/tralala/spiders/tralala_spider_mpc.py", line 673, in __init__
    self.chunkI = chunkI
NameError: global name 'chunkI' is not defined

您的参数存储在 kwargs 中,这只是一个字典,键充当参数名称,值充当参数值。它没有为您定义名称,因此您会收到错误。

有关更多详细信息,请参阅此答案

在您的具体情况中,请尝试self.chunkI = kwargs['chunkI']self.chunksNo = kwargs['chunksNo']

最新更新