由于coinmarketcap api计划的历史数据的限制,我正在寻求webscraper。
然而,尽管阅读了关于属性的糟糕文档,我还是卡在了第一个障碍上。
import json
import requests
from bs4 import BeautifulSoup
r = requests.get('https://coinmarketcap.com/historical/20210905/')
soup = BeautifulSoup(r.text, 'lxml')
print(soup)
包含在输出中是我试图抓取的数据。我要获取的数据:
2021年9月5日BTC的市值、价格和流通供应量。
数据出现在<script id="__NEXT_DATA__" type="application/json">
之后不久的输出中,因此我认为使用__NEXT_DATA__
作为属性id
将允许我访问数据。不幸的是。
包含数据的数据结构示例如下:
"listingHistorical":{"data":[{"id":1,"name":"Bitcoin","symbol":"BTC","slug":"bitcoin","num_market_pairs":8848,"date_added":"2013-04-28T00:00:00.000Z","tags":["mineable","pow","sha-256","store-of-value","state-channels","coinbase-ventures-portfolio","three-arrows-capital-portfolio","polychain-capital-portfolio","binance-labs-portfolio","arrington-xrp-capital","blockchain-capital-portfolio","boostvc-portfolio","cms-holdings-portfolio","dcg-portfolio","dragonfly-capital-portfolio","electric-capital-portfolio","fabric-ventures-portfolio","framework-ventures","galaxy-digital-portfolio","huobi-capital","alameda-research-portfolio","a16z-portfolio","1confirmation-portfolio","winklevoss-capital","usv-portfolio","placeholder-ventures-portfolio","pantera-capital-portfolio","multicoin-capital-portfolio","paradigm-xzy-screener"],"max_supply":21000000,"circulating_supply":18807550,"total_supply":18807550,"platform":null,"cmc_rank":1,"last_updated":"2021-09-05T23:00:00.000Z","quote":{"BTC":{"price":1,"volume_24h":585906.8067215424,"percent_change_1h":0,"percent_change_24h":0,"percent_change_7d":0,"market_cap":18807550,"fully_diluted_market_cap":null,"last_updated":"2021-09-05T23:59:03.000Z"},"USD":{"price":51753.41192620951,"volume_24h":30322676318.63,"percent_change_1h":-0.159917099159,"percent_change_24h":3.621580803777,"percent_change_7d":5.987281074996,"market_cap":973354882472.7817,"last_updated":"2021-09-05T23:00:00.000Z"}},"rank":1,"noLazyLoad":true},
是否有一个简单的解决方案?
这只是用于清单表,它完全加载在页面上。
https://coinmarketcap.com/historical/20210905/
->20210905→20121-09-05是日期,只需替换为所需的日期,它将显示数据https://coinmarketcap.com/historical/20210101/
为例,然后抓取并提取JSON数据。
您可以尝试这样做:
r = requests.get('https://coinmarketcap.com/historical/20210905/')
soup = BeautifulSoup(r.text)
data = json.loads(soup.find('script', type='application/ld+json', id='__NEXT_DATA__').text)
historical_data = data['listingHistorical']['data']
print historical_data