链接将我重定向到一个包含其他热门故事的存档页面https://www.coindesk.com/news/babel-finance-bets-on-longtime-fintech-hand-to-help-navigate-regulatory-landscape。标签news在.com之间的链接和巴别塔不应该在那里,因为我认为这是将新闻标题重定向到另一个页面。
from bs4 import BeautifulSoup
import requests
base_url ='https://www.coindesk.com/news'
source = requests.get(base_url).text
soup = BeautifulSoup(source, "html.parser")
articles = soup.find_all(class_ = 'list-item-card post')
#print(len(articles))
#print(articles)
for article in articles:
headline = article.h4.text.strip()
link = base_url + article.find_all("a")[1]["href"]
text = article.find(class_="card-text").text.strip()
img_url = base_url+article.picture.img['src']
print(headline)
print(link)
print(text)
print("Image " + img_url)
```
发生错误是因为您正在将基本链接(其中已经包含/news/)连接到绝对url
为了防止这种情况,您可以使用urllib.parse.urljoin()
在你的例子中,这应该解决这个问题:
from urllib.parse import urljoin
link = urljoin(base_url, article.find_all("a")[1]["href"])