Python Web Scrapping错误403,即使标头用户代理



我是一个学习Python的新手。使用BeautifulSoup和Requests来废弃"https://batdongsan.com.vn/nha-dat-ban-tp-hcm"为了收集我家乡的房价数据,即使尝试了Headers User Agent,我还是被403错误阻止了。下面是我的代码:

* * url3 ="https://batdongsan.com.vn/nha-dat-ban-tp-hcm"

headers = {"User-Agent"Mozilla/5.0 (Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49"}

page = requests。Get (url3, headers = headers)

打印(页面)* *

Result:

有人尝试并成功地绕过了同样的问题吗?任何帮助都是非常感谢的。

多谢

import cloudscraper
scraper = cloudscraper.create_scraper()
soup = BeautifulSoup(scraper.get("https://batdongsan.com.vn/nha-dat-ban-tp-hcm").text)
print(soup.text) ## do what you want with the response

你可以安装cloudscraper与pip install cloudscraper

最新更新