我将第二个变量(ID(传递给我的抓取器。作为一个蟒蛇初学者,我有点被困在这里了。这怎么可能呢?这是我的代码:
获取所有需要的值(ID、URL(:
# SQL pseudo code: get values:
SELECT
ID,
URL
...
将所有URL附加到start_URLs。我知道行[0]将是ID,但如何将其与URL关联?
results = curb.fetchall()
for row in results: start_urls.append(row[1])
使用URL启动请求时,我需要将ID和相应的URL一起传递,这样我就可以在稍后的代码中使用self访问它。ID。
def start_requests(self):
for url in self.start_urls:
if validators.url(url):
yield scrapy.Request(
# ID=ID,
url=url,
meta={'handle_httpstatus_list': [301, 302]},
callback=self.parse_item,
)
else:
print("Invalid URL ", format(url))
您应该在start_requests()
中执行此操作,然后可以在Request()
中使用meta={'id': ID, ...}
将此值发送到parse_item()
def start_requests(self):
results = curb.fetchall()
#for url in self.start_urls:
for row in results:
url = row[0]
ID = row[1]
if validators.url(url):
yield scrapy.Request(
url=url,
meta={'ID': ID, 'handle_httpstatus_list': [301, 302]},
callback=self.parse_item,
)
else:
print("Invalid URL ", format(url))
稍后您可以在parse_item()
中获得它
def parse_item(self, response):
ID = reponse.meta['ID']
编辑:
如果您在start_urls
中没有其他URL,那么您甚至可以使用它。您可以在start_urls
中保留rows
results = curb.fetchall()
for row in results:
start_urls.append(row)
def start_requests(self):
#for row in results:
for row in self.start_urls:
url = row[0]
ID = row[1]
if validators.url(url):
yield scrapy.Request(
url=url,
meta={'ID': ID, 'handle_httpstatus_list': [301, 302]},
callback=self.parse_item,
)
else:
print("Invalid URL ", format(url))
你甚至可以直接将其分配给
start_urls = curb.fetchall()
def start_requests(self):
for row in self.start_urls:
url = row[0]
ID = row[1]
# ... code ...
它应该添加到meta
参数中。
def start_requests(self):
for url in self.start_urls:
if validators.url(url):
yield scrapy.Request(
# ID=ID,
url=url,
meta={'handle_httpstatus_list': [301, 302]}, <<<< Add ID here.
callback=self.parse_item,
)
else:
print("Invalid URL ", format(url))
正确的方法是:
def start_requests(self):
for url in self.start_urls:
if validators.url(url):
yield scrapy.Request(
url=url,
meta={'ID': ID, 'handle_httpstatus_list': [301, 302]},
callback=self.parse_item,
)
else:
print("Invalid URL ", format(url))