Xpath选择仅返回第一个响应结果



我还是个新手。当试图从quotes.toscrape中读取数据时,使用xpath选择器时不会返回任何内容。一旦我使用css选择器,一切都会按预期进行。尽管这个例子非常简单,但我就是找不到错误。

报价.py

import scrapy
from quotes_loader.items import QuotesLoaderItem as QL
class QuotesSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['quotes.toscrape.com']
start_urls = [
'http://quotes.toscrape.com//']
def parse(self, response):
item = QL()
quotes = response.xpath('//div[@class="quote"]')
for quote in quotes:
# CSS-Selector
# item['author_name'] = quote.css('small.author::text').get()
# item['quote_text'] = quote.css('span.text::text').get()
# item['author_link'] = quote.css('small.author + a::attr(href)').get()
# item['tags'] = quote.css('div.tags > a.tag::text').get()
# XPATH-Selektor
item['author_name'] = quote.xpath('//small[@class="author"]/text()').get()
item['quote_text'] = quote.xpath('//span[@class="text"]/text()').get()
item['author_link'] = quote.xpath('//small[@class="author"]/following-sibling::a/@href').get()
item['tags'] = quote.xpath('//*[@class="tags"]/*[@class="tag"]/text()').get()
yield item
# next_page_url = response.css('li.next > a::attr(href)').get()
next_page_url = response.xpath('//*[class="next"]/a/@href').extract_first()
absolute_next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(absolute_next_page_url)

项目.py

import scrapy
from scrapy.loader import ItemLoader

class QuotesLoaderItem(scrapy.Item):
# define the fields for your item here like:
author_name = scrapy.Field()
quote_text = scrapy.Field()
author_link = scrapy.Field()
tags = scrapy.Field()

结果

author_name,quote_text,author_link,tags
Albert Einstein,“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”,/author/Albert-Einstein,change
Albert Einstein, ...
...
(20 times)

感谢您的承诺

我使用选择器对象而不是响应对象,因此语法必须如下所示。

import scrapy
from quotes_loader.items import QuotesLoaderItem as QL
class QuotesSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['quotes.toscrape.com']
start_urls = [
'http://quotes.toscrape.com//']
def parse(self, response):
item = QL()
quotes = response.xpath('//div[@class="quote"]')
for quote in quotes:
# CSS-Selector
# item['author_name'] = quote.css('small.author::text').get()
# item['quote_text'] = quote.css('span.text::text').get()
# item['author_link'] = quote.css('small.author + a::attr(href)').get()
# item['tags'] = quote.css('div.tags > a.tag::text').get()

# XPATH-Selector
item['author_name'] = quote.xpath('.//small[@class="author"]/text()').get()
item['quote_text'] = quote.xpath('.//span[@class="text"]/text()').get()
item['author_link'] = quote.xpath('.//small[@class="author"]/following-sibling::a/@href').get()
item['tags'] = quote.xpath('.//*[@class="tags"]/*[@class="tag"]/text()').get()
yield item
# next_page_url = response.css('li.next > a::attr(href)').get()
next_page_url = response.xpath('.//*[class="next"]/a/@href').extract_first()
absolute_next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(absolute_next_page_url)

最新更新