如何在网站URL不变的情况下使用Python BeautifulSoup

我想使用BeautifulSoup(或其他网络抓取工具(从网站中提取数据，尽管我很挣扎，因为在您作为用户登录之前和之后，网站的URL是相同的。我不想在这里公开分享网站地址，但如果需要，我会在下面发表评论。作为一个简单的例子，让我们使用"example.com"；作为参考：

当你第一次导航到abc.com时，URL正是(example.com(。要登录，用户单击登录按钮，然后转到"；example.com/login"；。问题是，在成功登录之后，URL返回到"；example.com"；尽管HTML代码发生了变化。当我尝试使用BS4获取网站的HTML代码时，我得到了HTML代码中的预登录，尽管登录后需要访问HTML。

这是我所拥有的：

from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.example.com/').text
soup = BeautifulSoup(source, 'html5lib')
name = soup.find('pointer') 
# this is the class I'm trying to search for, although am not able to find 
# because it is not part of 
# the HTML code in the **pre-log-in** - the class is part of the HTML after 
# logging in 
print(soup.prettify())

有人知道我该怎么解决吗？

感谢

使用selenium登录，然后将页面的源代码传递给beautifulsoup，然后在那里工作怎么样？这可能是实现这一目标的最简单的方法。

相关内容

最新更新

热门标签：