为什么熊猫网页抓取不能从本网站打印出任何表格？

我用pandas webcrapping写了这个简单的代码，它应该从这个股票网站提取数据。然而，一旦我运行了这段代码，它就会说"列表索引超出范围"；，这意味着这个网站上没有表格。如果你打开网站，你可以清楚地看到有多个表。有人能解释一下我是怎么修的吗？

网站链接：https://www.hkex.com.hk/Products/Listed-Derivatives/Single-Stock/Stock-Options?sc_lang=en

import pandas as pd
url = 'https://www.hkex.com.hk/Products/Listed-Derivatives/Single-Stock/Stock-Options?sc_lang=en'
dfs = pd.read_html(url)
print(len(dfs)) #Gets the row count of the table
print(dfs[0]) #prints the first table

从panda的角度来看，该页面中的表存在一些不一致。以下是将该页面上的第一个表作为数据帧的一种方法：

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://www.hkex.com.hk/Products/Listed-Derivatives/Single-Stock/Stock-Options?sc_lang=en'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
spec_table = soup.select('table[class="table migrate"]')[0]
df = pd.read_html(str(spec_table))[0]
print(df[:5].to_markdown())

这将返回数据帧：

不联交所代码标的股票名称HKATS代码合约规模(股份(手数量>经台湾金管会批准新鸿基地产有限公司50000✓12175吉利汽车控股有限公司GAH>110000✓金蝶国际软件集团有限公司。，有限公司KDS200021>50000350000nan45288<1td>WH Group有限公司<2td>WHG>2500<5td>210000nan

相关内容

最新更新

热门标签：