我是一个学习Python的新手。使用BeautifulSoup和Requests来废弃"https://batdongsan.com.vn/nha-dat-ban-tp-hcm"为了收集我家乡的房价数据,即使尝试了Headers User Agent,我还是被403错误阻止了。下面是我的代码:
* * url3 ="https://batdongsan.com.vn/nha-dat-ban-tp-hcm"
headers = {"User-Agent"Mozilla/5.0 (Windows NT 10.0;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49"}
page = requests。Get (url3, headers = headers)
打印(页面)* *
Result:
有人尝试并成功地绕过了同样的问题吗?任何帮助都是非常感谢的。
多谢
import cloudscraper
scraper = cloudscraper.create_scraper()
soup = BeautifulSoup(scraper.get("https://batdongsan.com.vn/nha-dat-ban-tp-hcm").text)
print(soup.text) ## do what you want with the response
你可以安装cloudscraper与pip install cloudscraper