如何使用Python和Beautiful Soup从HTML div元素访问href值?



如何从HTMLdiv访问链接?

这是HTML我试图刮,我想得到href值:

<div class="item-info-wrap">
<div class="news-feed_item-meta icon-font-before icon-espnplus-before"> <span class="timestamp">5d</span><span class="author">Field Yates</span> </div>
<h1> <a name="&amp;lpos=nfl:feed:5:news" href="/nfl/insider/story/_/id/31949666/six-preseason-nfl-trades-teams-make-imagining-deals-nick-foles-xavien-howard-more" class=" realStory" data-sport="nfl" data-mptype="story">
Six NFL trades we'd love to see in August: Here's where Foles could help, but it's not the Colts</a></h1>
<p>Nick Foles is running the third team in Chicago. Xavien Howard wants out of Miami. Let's project six logical deals.</p></div>

这是我一直试图使用访问href值的代码:

from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.espn.com/nfl/team/_/name/phi/philadelphia-eagles').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('div', class_='item-info-wrap'):
headline = article.h1.a.text
print(headline)
summary = article.p.text
print(summary)
try:
link_src = article.h1.a.href # Having difficulty getting href  value
print(link_src)
link = f'https://espn.com/{link_src}'
except Exception as e:
link = None
print(link)

对于每篇ESPN文章,我得到的输出是https://espn.com/None。感谢任何帮助和反馈!

如果像下面这样更改第12行代码,它应该可以工作。

link_src = article.h1.a["href"]

供参考https://www.crummy.com/software/BeautifulSoup/bs4/doc/属性

相关内容

  • 没有找到相关文章

最新更新