我正试图从这个网站上的图表中抓取数据:https://www.spglobal.com/spdji/en/indices/equity/sp-bmv-ipc/#overview
我在图表后面找到了JSON文件,并尝试使用以下代码将其导入pandas:
import pandas as pd
url = "https://www.spglobal.com/spdji/en/util/redesign/index-data/get-performance-data-for-datawidget-redesign.dot?indexId=92330739&getchildindex=true&returntype=T-¤cycode=MXN¤cyChangeFlag=false&language_id=1"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
df = pd.DataFrame(data, columns=['indexLevelsHolder'])
Data=df.iloc[3 , 0]
通过这样做,我得到了";数据";对象,该对象是包含JSON格式的时间序列数据的列表。
[{'effectiveDate': 1309406400000, 'indexId': 92330714, 'effectiveDateInEst': 1309392000000, 'indexValue': 43405.82, 'monthToDateFlag': 'N', 'quarterToDateFlag': 'N', 'yearToDateFlag': 'N', 'oneYearFlag': 'N', 'threeYearFlag': 'N', 'fiveYearFlag': 'N', 'tenYearFlag': 'Y', 'allYearFlag': 'Y', 'fetchedDate': 1626573344000, 'formattedEffectiveDate': '30-Jun-2011'}, .........
问题是,我找不到读取JSON数据和获取所需列(effectiveDate和indexValue(的方法。
有什么办法吗?感谢
您可以使用pd.json_normalize
将Json加载到列中:
import json
import urllib
import pandas as pd
url = "https://www.spglobal.com/spdji/en/util/redesign/index-data/get-performance-data-for-datawidget-redesign.dot?indexId=92330739&getchildindex=true&returntype=T-¤cycode=MXN¤cyChangeFlag=false&language_id=1"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
df = pd.json_normalize(data["indexLevelsHolder"]["indexLevels"])
print(df)
打印:
effectiveDate indexId effectiveDateInEst indexValue monthToDateFlag quarterToDateFlag yearToDateFlag oneYearFlag threeYearFlag fiveYearFlag tenYearFlag allYearFlag fetchedDate formattedEffectiveDate
0 1309406400000 92330714 1309392000000 43405.820000 N N N N N N Y Y 1626574897000 30-Jun-2011
1 1309492800000 92330714 1309478400000 43693.930000 N N N N N N Y Y 1626574897000 01-Jul-2011
2 1309752000000 92330714 1309737600000 43758.130000 N N N N N N Y Y 1626574897000 04-Jul-2011
3 1309838400000 92330714 1309824000000 43513.290000 N N N N N N Y Y 1626574897000 05-Jul-2011
...and son on.