为什么美丽汤找不到页面元素?



我在这里。我试过很多方法来让这个工作,但现在我直接困惑了。

试图解析此页:https://www.basketball-reference.com/teams/NYK/2021.html

我正在寻找一个特定的表使用下面的代码

from urllib.request import urlopen
from bs4 import BeautifulSoup
year = 2021
team = "NYK"
team_url = f"https://www.basketball-reference.com/teams/{team}/{year}.html"
html = urlopen(team_url)
soup = BeautifulSoup(html, 'html.parser')
tbl = soup.find('table', {'id': 'team_misc'})
print(tbl)

我的输出是一个空列表[]

当我检查页面时,存在id为team_misc的表。我是亲眼看到的。然而,我的代码什么也没有返回。有什么明显的原因吗?由于时间关系,我不会列出我尝试过的每一件事,但如果有人提出建议,我会说我是否尝试过。

再次感谢!

这将获得您所标识的表。您需要将chromedriver.exe下载到您的目录中或提供正确的路径。

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
year = 2021
team = "NYK"
team_url = f"https://www.basketball-reference.com/teams/{team}/{year}.html"
driver = webdriver.Chrome('chromedriver.exe', options=chrome_options)
driver.get(team_url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
tbl = soup.find('table', {'id': 'team_misc'})
print(tbl)

由于您正在查找的表位于HTML comment中,因此可能的解决方案是解析这些元素,并在找到匹配的id时返回。


from urllib.request import urlopen
from bs4 import BeautifulSoup, Comment #import the Comment object
year = 2021
team = "NYK"
team_url = f"https://www.basketball-reference.com/teams/{team}/{year}.html"
html = urlopen(team_url)
soup = BeautifulSoup(html, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
for c in comments:
ele = BeautifulSoup(c.strip(), 'html.parser')
if tbl := ele.find("table"):
if (tbl_id := tbl.get("id")) == "team_misc":
print(tbl)

最新更新