使用BeautifulSoup对带有CSS ID的标签进行Web抓取



我正试图在该网站上搜索id为"2004.advanced"(已存在(的标签。这是我试过的三行代码。

webpage = requests.get('https://www.basketball-reference.com/players/j/jamesle01.html')
soup = BeautifulSoup(webpage.content, 'html.parser')
print(soup.find_all( attrs = {'id': 'advanced.2004'}))

提前感谢您的帮助!

问题是您试图查找的元素在注释中。要解决此问题,请尝试循环浏览页面上的每个评论,使用BeautifulSoup解析其内容,并搜索所需的元素:

import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/players/j/jamesle01.html'
webpage = requests.get(url)
soup = BeautifulSoup(webpage.content, 'html.parser')
for comment in soup.find_all(text=lambda el:isinstance(el, Comment)):
comment_html = BeautifulSoup(comment, 'html.parser')
el = comment_html.find(id='advanced.2004')
if el != None: break
print(el)

最新更新