BeautifulSoup returns nothing

我正在努力学习如何从网站上报废组件，特别是这个网站https://genshin-impact.fandom.com/wiki/Serenitea_Pot/Load

当我遵循互联网上的指导时，我会收集一些重要的元素，比如类

"article-table sortable mw-collapsible jquery-tablesorter mw-made-collapsible"

以及像th和tb这样的html元素，以使用此代码获得它的特定内容

import requests
from bs4 import BeautifulSoup
URL = "https://genshin-impact.fandom.com/wiki/Serenitea_Pot/Load"
page = requests.get(URL)
#print(page.text)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="mw-content-text")
teapot_loads = results.find_all("table", class_="article-table sortable mw-collapsible jquery-tablesorter mw-made-collapsible")
for teapot_loads in teapot_loads:
table_head_element = teapot_loads.find("th", class_="headerSort")
print(table_head_element)
print()

我似乎已经写出了正确的元素(th(和正确的类名"；headerPort"但是程序不会返回任何内容，尽管程序中也没有错误。我做错了什么？

您可以调试代码，看看哪里出了问题。下面是一个这样的调试工作，我们只为表保留一个类，然后打印出实际元素的完整类：

import requests
from bs4 import BeautifulSoup
URL = "https://genshin-impact.fandom.com/wiki/Serenitea_Pot/Load"
page = requests.get(URL)
#print(page.text)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="mw-content-text")
# print(results)
teapot_loads = results.find_all("table", class_="article-table")
for teapot_load in teapot_loads:
print(teapot_load.get_attribute_list('class'))
table_head_element = teapot_load.find("th", class_="headerSort")
print(table_head_element)

这也将打印出(在您想要打印出的元素旁边(表类，如requests/BeautifulSoup:['article-table', 'sortable', 'mw-collapsible']所示。在原始HTML加载到页面中(带有原始类，由requests/BeautifulSoup看到(后，该页面中的Javascript开始工作，并向表中添加新的类。在搜索包含此类动态添加类的元素时，搜索会失败。

尽管如此，这里有一种更优雅的获取表格的方法：

import pandas as pd
url = 'https://genshin-impact.fandom.com/wiki/Serenitea_Pot/Load'
dfs = pd.read_html(url)
print(dfs[1])

这将返回一个带有该表的数据帧：

deptal Energy th>185

	图像	名称
0	nan	"弗洛蒂的天空之梦	60
1	nan	"《夏日森林指南》	60
2	nan	"夏日森林中的信使">	60
3		90
4	nan	"荒野中的座位">	20
5	nan	"Ballad Spinning Windwheel">	90
185
6	nan	"九步之间">	30

相关内容

最新更新

热门标签：