如何使用Scrapy抓取下一页



这是我的零碎代码。我不知道我的错误,但只在第一页。我如何在页面中进行抓取和遍历?还有其他方法可以抓取下一页吗?

import scrapy
class HurriyetEmlakPage(scrapy.Spider):

name = 'hurriyetspider'
allowed_domain = 'hurriyetemlak.com'
start_urls = ['https://www.hurriyetemlak.com/satilik']

def parse(self, response):

fiyat = response.xpath('//div[@class="list-view-price"]//text()').extract()
durum = response.xpath('//div[@class="middle sibling"]//div[@class="left"]//text()').extract()
oda_sayisi = response.xpath('//span[@class="celly houseRoomCount"]//text()').extract()
metrekare = response.xpath('//span[@class="celly squareMeter list-view-size"]//text()').extract()
bina_yasi = response.xpath('//span[@class="celly buildingAge"]//text()').extract()
bulundugu_kat = response.xpath('//span[@class="celly floortype"]//text()').extract()
konum = response.xpath('//div[@class="list-view-location"]//text()').extract()
scraped_info = {
'fiyat':fiyat,
'durum': durum,
'oda_sayisi' : oda_sayisi,
'metrekare' : metrekare,
'bina_yasi' : bina_yasi,
'bulundugu_kat': bulundugu_kat,
'konum' : konum
}
yield scraped_info
next_page_url = response.xpath('//li[@class="next-li pagi-nav"]//a').extract_first()
if next_page_url:
next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(url = next_page_url,callback = self.parse)

实际上,您可以简单地生成如下url列表:

url_list = [f"https://www.hurriyetemlak.com/satilik?page={page}" for page in range(1,7326)]

输出

['https://www.hurriyetemlak.com/satilik?page=1',
'https://www.hurriyetemlak.com/satilik?page=2',
'https://www.hurriyetemlak.com/satilik?page=3',
'https://www.hurriyetemlak.com/satilik?page=4',
'https://www.hurriyetemlak.com/satilik?page=5',
...]

最新更新