无法获得所有带有美丽汤的html页面



我正在尝试获取此网页的内容:https://www.zillow.com/homes/for_rent/1-_beds/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22mapBounds%22%3A%7B%22west%22%3A-122.67022170019531%2C%22east%22%3A-122.19643629980469%2C%22sout%22%3A37.615282466144976%2C%2north%22%3A3793495488175342%7D%2C%22mapZoom%22%3A11%2C%22SMapVisible%22%3Atrue%2C%2filterState%22%3A%7B%22price%22%3A%7%22max%22%3A872627%7D%2C%22beds%22%3A%7B%22min%22%3A1%7D%22%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22mp%22%3A%7B%22max%22%3A3000%7D+2C%22c%22%3A%7%B%22value/22%3False%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22cmsn%22%3A%7B%22c%22%3Afalse%7D+2C%22fsba%22%3A%7%22value%220%3Afalse%7%7D%2C%22isListVisible%22%3Arue%7D

我不能全部得到。很多元素都是空的。有人告诉我是这样的,因为这是js代码,bs4无法读取js,我不得不使用selenium,但我想用bs4来做,我知道有办法做到这一点。我也被告知是这样,因为我不在正确的iframe中,但我似乎不是真的。例如,如果你检查其中一个列出的价格(例如2200美元/月(,你会发现它包含在一个ul列表中,并且列出的每个公寓都是该列表的li元素。但是当我用bs刮页面时,这些li元素似乎大部分都是空的。另外,请记住,我是网络抓取和python的新手,所以请冷静。谢谢

这是我用来获取页面html:的代码

self.response = requests.get(url=URL, headers=headers)
self.html_doc = self.response.text
self.soup = BeautifulSoup(self.html_doc, 'html.parser')

是的,这个站点使用react。查看chorme或firefox上的浏览器开发工具NETWORK,看看文件和请求是如何让你浏览的。检查指向数据的调用堆栈和更多请求详细信息。我在dt网络上看到这个链接https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22mapBounds%22%3A%7B%22west%22%3A-122.83501662207031%2C%22east%22%3A-122.03164137799269%2C%22south%22%3A37.5486236026355%2C%22c%22%3A38.00126648128239%7D%2C%22mapZoom%22%3A11%2C%22sMapVisible%22%3Atrue%2C%22C%22C%22category%22%3A%22c%22c%22C%22c%2filterState%22%3A%7B%22price%22%3A%3B%22max%22%3A872627%7D%2C%22beds%22%3A%7%7B%22min%22%3A 1%7D%2C%22isForSaleForeclosure%22%3A%7B%22value%22%3Afalse%7D%2C%22每月付款%22%3A%7%22max%22%3A3000%7%D%2C%22isNewConstruction%22%3A%7B%22value%22%3Afalse%7%D%2C2%22isComingSoon%22%3A%7%2C%22isForSaleByAgent%22%3A%7B%22 value%22%3A false%7D%2C%22sortSelection%22%3A/7B%22value%22%3A%22globalrelevanceex%22%7D%7D%2C%22is ListVisible%22%3Atrue%7D&wants={%22cat2%22:[%22listResults%22,%22mapResults%22],%22cat1%22:[%22 total%22]}&requestId=6.React基于这些数据构建网站页面。我的英语不好,但我希望我能帮上忙。

最新更新