Web刮擦:刮擦表问题



希望在下面的url中抓取主coins表的全部内容。

然而,我下面的代码似乎不起作用:

import pandas as pd
url = 'https://messari.io/screener/coinbase-ventures-portfolio-34D634C4'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)

我哪里错了?

您可以直接从呈现数据的源获取数据:

import requests
import pandas as pd
url = 'https://data.messari.io/api/v1/markets/prices-legacy'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'}
jsonData = requests.get(url, headers=headers).json()
data = pd.json_normalize(jsonData['data'])

输出:

print(data)
id  ... stakingEngagedPercent
0     1e31218a-e44e-4285-820c-8282ee222035  ...                   NaN
1     21c795f5-1bfd-40c3-858e-e9d7e820c6d0  ...                   NaN
2     7dc551ba-cfed-4437-a027-386044415e3e  ...                   NaN
3     97775be0-2608-4720-b7af-f85b24c7eb2d  ...                   NaN
4     51f8ea5e-f426-4f40-939a-db7e05495374  ...                   NaN
...  ...                   ...
1609  ff4f6990-5333-4e75-81cb-1342af9cc0a1  ...                   NaN
1610  ffae284d-cb73-44e5-8934-cb3658284e46  ...                   NaN
1611  ffaebc24-053e-428e-a84d-be836e4f8a3a  ...                   NaN
1612  ffc64018-c724-44ac-b3d0-00e33dff7615  ...                   NaN
1613  ffde2011-560a-458b-abaa-2b4f20f851a2  ...                   NaN
[1614 rows x 177 columns]

页面是动态的,它不包含表,当你下载它时,你得到的是一些将用于渲染页面的脚本。

最新更新