Bs4刮网器未提供实际链接



我试图在2lib.org上抓取下载链接,但由于某种原因,打印的链接不是我手动检查时显示的链接。这是我的代码-

import requests
from bs4 import BeautifulSoup

URL = 'https://2lib.org/book/1844071/8add88/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'lxml')
for tag in soup.select('a.btn-primary.dlButton.addDownloadedBook'):
print (f"2lib.org{tag['href']}")

现在,它打印了一个以4da009结尾的链接,这不是我手动检查网站时得到的链接。我有什么东西不见了吗?

感谢

尝试指定User-Agent:

import requests
from bs4 import BeautifulSoup

URL = "https://2lib.org/book/1844071/8add88/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, "lxml")
for tag in soup.select("a.btn-primary.dlButton.addDownloadedBook"):
print(f"https://2lib.org{tag['href']}")

打印:

https://2lib.org/dl/1844071/e50b5e

最新更新