我想从一个 main.py 文件中调用所有蜘蛛



我使用 scrapy 在 8 个.py文件中构建了 8 个爬虫。现在我想创建一个文件,比如说 init.py 负责在输入时调用爬虫函数。我该怎么做?

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapycrawler.items import ScrapycrawlerItem

我也这样做了:

process = CrawlerProcess({ 
'FEED_FORMAT': 'CSV', 
'FEED_URI': 'output.csv',
}) 
process = CrawlerProcess()
process.crawl(spider_class)
process.start()

Items.py:-

import scrapy

class ScrapycrawlerItem(scrapy.Item):
# define the fields for your item here like:
date_publish = scrapy.Field()
date_updated = scrapy.Field()
headline = scrapy.Field()
maintext = scrapy.Field()
description = scrapy.Field()
image_url = scrapy.Field()
article_url = scrapy.Field()

等级制度:

Crawler
|-scrapycrawler
|  |-__pycache__
|  | |-...
|  |-spiders
|  | |-__pycache__
|  | |-__init__.py
|  | |-crawler1.py
|  | |-crawler2.py
|  | |-crawler3.py
|  | |-crawler4.py
|  | |-crawler5.py
|  | |-crawler6.py
|  | |-crawler7.py
|  | |-crawler8.py
|  |-__init__.py
|  |-items.py
|  |-middlewares.py
|  |-pipelines.py
|  |-settings.py
|-scrapy.cfg

我已经这样做了:从scrapycrawler.items导入ScrapycrawlerItem 但它显示的错误是

from scrapycrawler.items import ScrapycrawlerItem 
ModuleNotFoundError: No module named 'scrapycrawler'

事实上,你可以试试这个:

import subprocess
subprocess.call("scrapy crawl <spider_name>") # run a command?

我曾经试过这个。

最新更新