Scrapy:改变数据的输出方式



我有一个问题,一时想不出来

由于网站结构,我将数据捕获到json文件中,如下所示:

[{"location": ["(u5357u6295)", "(u53f0u5357)", "(u53f0u5357)"], 
"leisuretitle": ["2014", "20140721", "20140726"]}]

但是我想要的格式是:

{"leisurelocation": ["(u5357u6295)"], "leisuretitle": ["2014"]},   
{"leisurelocation": ["(u53f0u5357)"], "leisuretitle": ["20140721"]},  
{"leisurelocation": ["(u53f0u5357)"], "leisuretitle": ["20140726"]}]
下面是我的代码: 我不知道该怎么做。有人能给我指点一下吗?
def parse(self, response):
    sel = Selector(response)
    sites = sel.css("div#listabc table ")
    for site in sites:
        item = LeisureItem()
        leisurelocation = site.css(" tr > td.subject > span.city::text ").extract()
        leisuretitle =  site.css(" tr > td.subject a::text ").extract()
        item['leisurelocation'] = leisurelocation
        item['leisuretitle'] = leisuretitle
        yield item

您想要的是从leisurelocationleisuretitle生成多个项目:

leisurelocation = ...
leisuretitle =  ...
for i,j in zip(leisurelocation, leisuretitle):
    yield LeisureItem(leisurelocation=[i], leisuretitle=[j])

kev的答案对于您定义的问题是正确的,但我认为这不是正确的方法。你应该一个一个地刮。

例如,逐行循环遍历表,并将每一行作为项生成:

def parse(self, response):
    for city in response.css("div#listabc table>tr"):
        item = LeisureItem()
        item['leisurelocation'] = city.css("td.subject>span.city::text").extract()
        item['leisuretitle'] = city.css("td.subject a::text").extract()
        yield item

最新更新