乱糟糟的物品对象出错

我有一个问题与Scrappy项目对象。当前的问题是，当我抓取某些字段时，我将它们保存为这样:

item['tag'] = response.xpath("//div[contains(@class, 'video-info-row showLess')]"
                                     "//a[contains(@href, '/video/search?search')]/text()").extract()

每次通过都有多个标签被刮掉并保存到item['tag']。然后我去上传标签到我的SQL服务器，得到一个mySQL语法错误。这个问题很明显，因为它试图插入类似:'tag1', u'tag2', u'tag3', u'tag4', u'tag5', u'tag6'的东西。有没有办法摆脱引号，因为我已经尝试过。replace(" ' "， ")，但它没有工作。

您需要为特定字段设置Join()输出处理器:

import scrapy
from scrapy.contrib.loader.processor import Join
class MyItem(scrapy.Item):
    my_field = scrapy.Field(output_processor=Join(separator=','))

基于alecxe的答案，处理器只能与项目加载器(http://doc.scrapy.org/en/latest/topics/loaders.html):

)一起工作。

def parse(self, response):
    l = ItemLoader(MyItem(), response)
    l.add_xpath('tag', '//a[@href="/video/search?search"]/text()')
    return l.load_item()

另一个解决方案是简单地使用join方法:

def parse(self, response):
    item = MyItem()
    item['tag'] = ','.join(response.xpath('//a[@href="/video/search?search"]/text()').extract())
    return item

相关内容

最新更新

热门标签：