使用Scrapy提取主页结果



我一直在努力从主搜索页面中提取所有属性结果(https://just.property/property/residential/sale/cape-town-western-cape/);每个属性都存储在一个div类中,所以我不太确定如何访问所有属性。我尝试使用访问单个属性

results = response.xpath('//div[@class="col-md-8"]/div[@class="results"]/div[@id="2259870"]/div[@class="prop-details"]/text()').getall()

但我总是得到一个空数组。有什么建议吗?提前谢谢!

上述网站的搜索结果来自API调用:

https://just.property/includes/doSearch.php

您需要模拟此呼叫来收集所需的信息

应该是这样的:

import requests
headers = {
'authority': 'just.property',
'accept': 'application/json, text/javascript, */*; q=0.01',
'x-requested-with': 'XMLHttpRequest',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'https://just.property',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://just.property/property/residential/sale/cape-town-western-cape/',
'accept-language': 'en-US,en;q=0.9,ru-RU;q=0.8,ru;q=0.7,uk;q=0.6,en-GB;q=0.5',
'cookie': 'PHPSESSID=7347564fa5f107ddd1d368a28ef4fc8e; _ga=GA1.2.21988185.1600678549; _gid=GA1.2.1286094223.1600678549; search=%7B%22area%22%3A%22cape-town-western-cape%22%2C%22type%22%3A%22Sale%22%2C%22property_types%22%3A%22%22%2C%22min%22%3A%22100000%22%2C%22max%22%3A%2235000000%22%2C%22beds%22%3A%220%22%2C%22baths%22%3A%220%22%2C%22start%22%3A%220%22%2C%22limit%22%3A%2220%22%2C%22order%22%3A%22none%22%2C%22zone%22%3A%22residential%22%2C%22on_show%22%3A%220%22%7D; _gat=1',
}
data = {
'area': 'cape-town-western-cape',
'type': 'Sale',
'min': '100000',
'max': '35000000',
'zone': 'residential',
'beds': '0',
'baths': '0',
'start': '0',
'limit': '20'
}
response = requests.post('https://just.property/includes/doSearch.php', headers=headers, data=data)

最新更新