使用刮擦版本 0.22.1 进行多页抓取 - "cannot import name CrawlSpider"错误是什么意思?



我正试图编写一个蜘蛛,通过以下URL在多个页面上爬行:http://bookshop.lawsociety.org.uk/ecom_lawsoc/public/saleproducts.jsf?catId=EBOOK我使用的是Scrapy 0.22.1版本。然而,我得到了"无法导入名称CrawlSpider"消息。我已经在下面粘贴了蜘蛛的代码。有人能确定我哪里出了问题吗?

from scrapy.spider import CrawlSpider, Rule
from scrapy.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import BookpagesItem 
class BookpagesSpider(CrawlSpider):
name = "book_sample"
allowed_domains = ["bookshop.lawsociety.org.uk"]
start_urls = ["http://bookshop.lawsociety.org.uk/ecom_lawsoc/public/saleproducts.jsf?catId=EBOOK",
              ]
rules = (
    Rule(SgmlLinkExtractor(allow=('//*[@id="productList:scrollernext"]', )), callback='parse_item', follow= True),
    Rule(SgmlLinkExtractor(allow=('//p/a[contains(@id, "productList")]', )), callback='parse_item', follow= True),
)
def parse_item(self, response):
    sel = Selector(response)
    sites = sel.xpath('//div[@class="dataListDiv"]')
    items = []
    for site in sites:
        item = BooksItem()
        item['title'] = site.xpath('//div/a/h3[@class="saleProductsTitle"]/text()').extract()
        item['link'] = site.xpath('//p/a[contains(@id, "productList")]').extract()
        item['price'] = site.xpath('//*[@class="saleProductsPrice"]/text()').extract()
        item['category'] = site.xpath('//span[contains(@id, "category")]/text()').extract()
        item['authors'] = site.xpath('//span[contains(@id, "author")]/text()').extract()
        item['date'] = site.xpath('//span[contains(@id, "publicationDate")]/text()').extract()
        item['publisher'] = site.xpath('//span[contains(@id, "publisher")]/text()').extract()
        item['isbn'] = site.xpath('//span[contains(@id, "isbn")]/text()').extract()
        items.append(item)
    return items

items.py代码为:

from scrapy.item import Item, Field
class BookpagesItem(Item):
# define the fields for your item here like:
# name = Field()
title = Field()
link = Field()
price = Field()
category = Field()
authors = Field()
date = Field()
publisher = Field()
isbn = Field()

表示from scrapy.spider import CrawlSpider, Rule不正确。

查看Scrapy文档,它可能是from scrapy.contrib.spiders import CrawlSpider

每当你收到NameError-Cannot import name foo错误时,你就会看到一个不正确的导入,所以你可以将其缩小到你的导入语句。您可以在库的文档中查找正确的位置,或者源代码本身(如果可用的话)。

我搜索了这些零散的文档,发现了以下内容:http://doc.scrapy.org/en/0.24/topics/spiders.html#crawlspider

导入语句不正确。编写导入语句的正确方法如下:从scratch.spider导入CrawlSpider,规则

相关内容

最新更新