我可以将 while 循环与 'i' 一起使用作为将在 xpath 的 tr[i] 中使用的变量吗?


import scrapy
import logging
class AssetSpider(scrapy.Spider):
name = 'asset'
start_urls = ['http://mnregaweb4.nic.in/netnrega/asset_report_dtl.aspx?lflag=eng&state_name=WEST%20BENGAL&state_code=32&district_name=NADIA&district_code=3201&block_name=KRISHNAGAR-I&block_code=&panchayat_name=DOGACHI&panchayat_code=3201009009&fin_year=2020-2021&source=national&Digest=8+kWKUdwzDQA1IJ5qhD8Fw']
def parse(self, response):
i = 4
while i<2236:
assetid = response.xpath("//table[2]//tr['i']/td[2]/text()")
assetcategory = response.xpath("//table[2]//tr['i']/td[3]/text()")
schemecode = response.xpath("//table[2]//tr['i']/td[5]/text()")
link = response.xpath("//table[2]//tr['i']/td[6]/a/@href")
schemename = response.xpath("//table[2]//tr['i']/td[7]/text()")
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1

我想使用"I"变量在tr[position]的xpath中从4循环到2235。我只是不知道这是否可能!如果这是可能的,那么正确的方法是什么?我的不管用。

当然,这是可能的,并且被广泛使用
您可以使用变量格式化字符串
这有几种语法
例如,您可以这样做:

i = 4
while i<2236:
assetid_path = "//table[2]//tr[{1}]/td[2]/text()".format(i)
assetcategory_path = "//table[2]//tr[{1}]/td[3]/text()".format(i)
schemecode_path = "//table[2]//tr[{1}]/td[5]/text()".format(i)
link_path = "//table[2]//tr[{1}]/td[6]/a/@href".format(i)
schemename_path = "//table[2]//tr[{1}]/td[7]/text()".format(i)
assetid = response.xpath(assetid_path)
assetcategory = response.xpath(assetcategory_path)
schemecode = response.xpath(schemecode_path)
link = response.xpath(link_path)
schemename = response.xpath(schemename_path)
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1

而上述内容可以这样缩短:

i = 4
while i<2236:
root_path = "//table[2]//tr[{1}]".format(i)
assetid_path = root_path + "/td[2]/text()"
assetcategory_path = root_path + "/td[3]/text()"
schemecode_path = root_path + "/td[5]/text()"
link_path = root_path + "/td[6]/a/@href"
schemename_path = root_path + "/td[7]/text()"
assetid = response.xpath(assetid_path)
assetcategory = response.xpath(assetcategory_path)
schemecode = response.xpath(schemecode_path)
link = response.xpath(link_path)
schemename = response.xpath(schemename_path)
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1

但更好的方法是使用绑定变量。如下:

i = 4
while i<2236:
assetid = response.xpath("//table[2]//tr[$i]/td[2]/text()",i=i))
assetcategory = response.xpath("//table[2]//tr[$i]/td[3]/text()",i=i))
schemecode = response.xpath("//table[2]//tr[$i]/td[5]/text()",i=i)
link = response.xpath("//table[2]//tr[$i]/td[6]/a/@href",i=i)
schemename = response.xpath("//table[2]//tr[$i]/td[7]/text()",i=i)
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1

您将字符串发送到xpath,因此我建议使用格式化。。。例如:

response.xpath(f"//table[2]//tr[{i}]/td[2]/text()")

最新更新