用多个键值对填充Scrapy项目

我的代码如下:

import scrapy
class FcsItem(scrapy.Item): #items.py file
    title = scrapy.Field()
    link = scrapy.Field()
#test.py file below, different file from above
import scrapy
from fcs.items import FcsItem
class FCScrape(scrapy.Spider): 
    name = "FC"
    allowed_domains = ["finalcall.com"]
    start_urls = ["http://www.finalcall.com/artman/publish/Columns_4/index.shtml"]
    def parse(self, response):
        item = FcsItem()
        divs_title = response.selector.xpath('//div[@class="category-story"]')
            
        for title, link in zip(divs_title.xpath('.//a/text( )'), divs_title.xpath('.//a/@href')):
            item['title'] = title.extract()
            item['link'] = link.extract()
            #I'm actually trying to attach the title as a string as the key and the link as a string as the value in one dictionary.

我尝试了很多不同的方法，但我一直遇到的问题是无法获得所有的键:值对，而不是只有一个。我如何修改我的代码来实现这一点?

这个页面是如何设置的，以及你是如何选择的，所有你抓取的是一对，以元组的形式。当您执行zip(divs_title.xpath('.//a/text( )'), divs_title.xpath('.//a/@href')时，您返回标记文本的一项列表和href内容的一项列表。你拉上拉链，得到一件东西。

(不太好的)解决方案是尝试使用Key:Value对将所有这些项保存在字典中，正如您所要求的那样。为此，循环遍历"类别-故事"部分，因为它们是您想要的文章。你不需要在这里使用items，因为你似乎并没有把它当作item来使用:

def parse(self, response):
    the_dict = {}
    for article in response.selector.xpath('//div[@class="category-story"]'):
        title = article.xpath('.//a/text( )').extract()
        link = article.xpath('.//a/@href').extract()
        the_dict[title] = link

更好的解决方案(看起来像你的最终目标)是继续使用你的项，并有一个管道处理你想对这些内容做的任何事情。

def parse(self, response):
    for article in response.selector.xpath('//div[@class="category-story"]'):
        item = FcsItem()
        item['title'] = article.xpath('.//a/text( )').extract()
        item['link'] = article.xpath('.//a/@href').extract()
        yield item

相关内容

最新更新

热门标签：