Read HTML函数爬网Pokemongo数据表失败

我试图通过爬取一个表来练习使用panda的read_html函数，但我遇到了一个错误。我的代码如下：

import pandas as pd
url = "https://www.pokemondb.net/pokedex/all"
dfs = pd.read_html(url)

上面的代码返回了错误，但没有成功，所以我尝试了下面的代码，但仍然没有成功。

from bs4 import BeautifulSoup
import pandas as pd
import requests
url = "https://www.pokemondb.net/pokedex/all"
html = requests.get(url)
soup = BeautifulSoup(html.text, "html.parser")
dfs = pd.read_html(soup.table)

我不知道怎么了。有人能启发我吗？

谢谢！

在read_html的文档中，您可以看到它不适用于https

所以你的第一个版本证实了这一点。

在第二个版本中，您不需要BeautifulSoup。

read_html()使用自己的bs4或lxml或html5-请参阅文档选项flavor来选择它。

import requests
import pandas as pd
url = "https://www.pokemondb.net/pokedex/all"
html = requests.get(url)
dfs = pd.read_html(html.text)
print(dfs)

我希望它能对你有所帮助。

table = soup.findAll('table', attrs={'id':'pokedex'})

然后将表转换为字符串。

dfs = pd.read_table(str(table))

它会给你一个输出。

相关内容

最新更新

热门标签：