Scrapy 2.0.1:如何定义输出顺序



抓取时,我的输出顺序与spider/item文件中的写入顺序不匹配。

例如:

def parse(self, response):
complete_article = response.xpath('//div[@class="storywrapper"]')
for article in complete_article:
dachzeile = article.xpath('.//div[@class="meldungHead"]/h1/...
headline = article.xpath('.//div[@class="meldungHead"]/h1/...
date = article.xpath('//meta[@name="date"]...
datum = date.split("T")[0]
uhrzeit = date.split("T")[1]
ueberschrift = article.xpath('.//div[@class="mod ....
text = article.xpath('//div[@class="storywra...
relative_image = article.xpath('//div[@class="media ...
final_image = self.base_url + relative_image
url = response.url.encode('utf-8')
items = testItem()
items['Dachzeile'] = dachzeile
items['Titel'] = headline
items['Datum'] = datum
items['Zeit'] = uhrzeit
items['Einleitung'] = ueberschrift
items['Artikel'] = text
items['Bild'] = final_image
items['Adresse'] = url
yield items

但是json文件中的输出看起来像:

[
{
"Artikel": "....",
"Einleitung": "...",
"Titel": "...",
"Zeit": "19:43:10",
"Datum": "2020-03-28",
"Adresse": "....html",
"Bild": "...,
"Dachzeile": "...,
}
]

如何设置输出文件的顺序?

提前致以最良好的问候和感谢!

您可以使用OrderedDict来维护订单

从集合导入OrderedDict

for article in complete_article:
... your code
items = OrderedDict()
items['Dachzeile'] = dachzeile
items['Titel'] = headline
items['Datum'] = datum
items['Zeit'] = uhrzeit
items['Einleitung'] = ueberschrift
items['Artikel'] = text
items['Bild'] = final_image
items['Adresse'] = url
yield items

最新更新