Python & BS4 分页循环



我是网页抓取的新手,我正在尝试 https://www.metrocuadrado.com/bogota 在此页面上执行此操作。

这个想法是提取所有信息。到目前为止,我已经能够只用一页来完成,但我不知道如何使用分页来做到这一点。有什么方法可以根据我已经拥有的代码做到这一点吗?

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
# opening up connection, grabbing html
my_url = 'https://www.metrocuadrado.com/bogota'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parser
page_soup = soup(page_html, "html.parser")

# grabs each product
containers = page_soup.findAll("div",{"class":"detail_wrap"})
filename = "metrocuadrado.csv"
f = open(filename, "w")
headers= "propertytype, businestype, cityname, neighborhood, description, price, arean"
f.write(headers)

for container in containers:
    property_type = container[propertytype]
    busines_type = container[businestype]
    city_name = container[cityname]
    neighborhood_location = container[neighborhood]
    description = container.div.a.img["alt"]
    price_container = container.findAll("span",{"itemprop":"price"})
    price =  price_container[0].text
    area_container = container.findAll("div",{"class":"m2"})
    area = area_container[0].p.span.text
    print("property_type: " + property_type)
    print("busines_type: " + busines_type)
    print("city_name: " + city_name)
    print("neighborhood_location: " + neighborhood_location)
    print("description: " + description)
    print("price: " + price)
    print("area: " + area)
f.write(property_type + "," + busines_type + "," + city_name + "," + neighborhood_location + "," + description.replace(",", "|") + "," + price + "," + area + "n")
f.close()

您将需要抓取每个页面(可能在循环中(,通过弄清楚获取第 2 页、第 3 页等的调用来执行此操作。 您可以尝试通过查看页面源代码或使用浏览器中的开发人员工具并查看网络调用来解决这个问题。

最新更新