使用的链接:http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00它有三个国家在一段时间内的通货膨胀数据。这是我到目前为止的代码:
import urllib.request, json
import pandas as pd
import requests
from pandas.io.json import json_normalize
url = r"http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
df2 = pd.DataFrame(pd.json_normalize(data))
print(df)
这是输出,这是相当远,我需要的…:我需要JSON文件中提到的每个日期的3个国家的HCIP。
version label
0 2.0 HICP (2015 = 100) - monthly data (monthly rate...
href source updated
0 http://ec.europa.eu/eurostat/wdds/rest/data/v2... Eurostat 2021-08-31
class id size status.0 status.1 ...
0 dataset [unit, coicop, geo, time] [1, 1, 3, 307] d d ...
dimension.time.category.label.2020M11 dimension.time.category.label.2020M12
0 2020M11 2020M12
dimension.time.category.label.2021M01 dimension.time.category.label.2021M02
0 2021M01 2021M02
dimension.time.category.label.2021M03 dimension.time.category.label.2021M04
0 2021M03 2021M04
dimension.time.category.label.2021M05 dimension.time.category.label.2021M06
0 2021M05 2021M06
dimension.time.category.label.2021M07 dimension.time.category.label.2021M08
0 2021M07 2021M08
你知道我做错了什么吗?或者如何正确导入这种格式?
根据我对你的数据的理解:
data['id']
指定尺寸顺序data['dimension'][*]['category']
指定每个维度的值顺序data['value']
为原始值。
其余的似乎都是多余的。
现在最简单的是提取值,让我们也确保索引被正确排序:
>>> val = pd.Series(data['value']).rename(index=int).sort_index()
>>> val
0 2.4
1 1.6
2 1.7
3 2.2
4 0.8
...
916 0.2
917 0.6
918 0.4
919 0.5
920 0.2
Length: 921, dtype: float64
然后,对于每个维度,我们可以从信息构造一个简单的数据框架,例如geo
:
>>> pd.DataFrame({key: val for key, val in data['dimension']['geo']['category'].items()})
index label
HU 0 Hungary
PL 1 Poland
SK 2 Slovakia
按列index
排序,我们将得到label
的顺序,并根据data['id']
排序,我们得到:
>>> dimensions = [pd.DataFrame({
... key: val for key, val in data['dimension'][dim]['category'].items()
... }).sort_values('index')['label'].values for dim in data['id']]
>>> dimensions
[array(['Monthly rate of change'], dtype=object), array(['All-items HICP'], dtype=object), array(['Hungary', 'Poland', 'Slovakia'], dtype=object), array(['1996M02', '1996M03', '1996M04', '1996M05', '1996M06', '1996M07',
'1996M08', '1996M09', '1996M10', '1996M11', '1996M12', '1997M01',
'1997M02', '1997M03', '1997M04', '1997M05', '1997M06', '1997M07',
'1997M08', '1997M09', '1997M10', '1997M11', '1997M12', '1998M01',
'1998M02', '1998M03', '1998M04', '1998M05', '1998M06', '1998M07',
'1998M08', '1998M09', '1998M10', '1998M11', '1998M12', '1999M01',
'1999M02', '1999M03', '1999M04', '1999M05', '1999M06', '1999M07',
'1999M08', '1999M09', '1999M10', '1999M11', '1999M12', '2000M01',
'2000M02', '2000M03', '2000M04', '2000M05', '2000M06', '2000M07',
'2000M08', '2000M09', '2000M10', '2000M11', '2000M12', '2001M01',
'2001M02', '2001M03', '2001M04', '2001M05', '2001M06', '2001M07',
'2001M08', '2001M09', '2001M10', '2001M11', '2001M12', '2002M01',
'2002M02', '2002M03', '2002M04', '2002M05', '2002M06', '2002M07',
'2002M08', '2002M09', '2002M10', '2002M11', '2002M12', '2003M01',
'2003M02', '2003M03', '2003M04', '2003M05', '2003M06', '2003M07',
'2003M08', '2003M09', '2003M10', '2003M11', '2003M12', '2004M01',
'2004M02', '2004M03', '2004M04', '2004M05', '2004M06', '2004M07',
'2004M08', '2004M09', '2004M10', '2004M11', '2004M12', '2005M01',
'2005M02', '2005M03', '2005M04', '2005M05', '2005M06', '2005M07',
'2005M08', '2005M09', '2005M10', '2005M11', '2005M12', '2006M01',
'2006M02', '2006M03', '2006M04', '2006M05', '2006M06', '2006M07',
'2006M08', '2006M09', '2006M10', '2006M11', '2006M12', '2007M01',
'2007M02', '2007M03', '2007M04', '2007M05', '2007M06', '2007M07',
'2007M08', '2007M09', '2007M10', '2007M11', '2007M12', '2008M01',
'2008M02', '2008M03', '2008M04', '2008M05', '2008M06', '2008M07',
'2008M08', '2008M09', '2008M10', '2008M11', '2008M12', '2009M01',
'2009M02', '2009M03', '2009M04', '2009M05', '2009M06', '2009M07',
'2009M08', '2009M09', '2009M10', '2009M11', '2009M12', '2010M01',
'2010M02', '2010M03', '2010M04', '2010M05', '2010M06', '2010M07',
'2010M08', '2010M09', '2010M10', '2010M11', '2010M12', '2011M01',
'2011M02', '2011M03', '2011M04', '2011M05', '2011M06', '2011M07',
'2011M08', '2011M09', '2011M10', '2011M11', '2011M12', '2012M01',
'2012M02', '2012M03', '2012M04', '2012M05', '2012M06', '2012M07',
'2012M08', '2012M09', '2012M10', '2012M11', '2012M12', '2013M01',
'2013M02', '2013M03', '2013M04', '2013M05', '2013M06', '2013M07',
'2013M08', '2013M09', '2013M10', '2013M11', '2013M12', '2014M01',
'2014M02', '2014M03', '2014M04', '2014M05', '2014M06', '2014M07',
'2014M08', '2014M09', '2014M10', '2014M11', '2014M12', '2015M01',
'2015M02', '2015M03', '2015M04', '2015M05', '2015M06', '2015M07',
'2015M08', '2015M09', '2015M10', '2015M11', '2015M12', '2016M01',
'2016M02', '2016M03', '2016M04', '2016M05', '2016M06', '2016M07',
'2016M08', '2016M09', '2016M10', '2016M11', '2016M12', '2017M01',
'2017M02', '2017M03', '2017M04', '2017M05', '2017M06', '2017M07',
'2017M08', '2017M09', '2017M10', '2017M11', '2017M12', '2018M01',
'2018M02', '2018M03', '2018M04', '2018M05', '2018M06', '2018M07',
'2018M08', '2018M09', '2018M10', '2018M11', '2018M12', '2019M01',
'2019M02', '2019M03', '2019M04', '2019M05', '2019M06', '2019M07',
'2019M08', '2019M09', '2019M10', '2019M11', '2019M12', '2020M01',
'2020M02', '2020M03', '2020M04', '2020M05', '2020M06', '2020M07',
'2020M08', '2020M09', '2020M10', '2020M11', '2020M12', '2021M01',
'2021M02', '2021M03', '2021M04', '2021M05', '2021M06', '2021M07',
'2021M08'], dtype=object)]
现在简单地用这个来定义你的数据的索引,你就完成了:
>>> val.index = pd.MultiIndex.from_product(dimensions, names=data['id'])
>>> val
unit coicop geo time
Monthly rate of change All-items HICP Hungary 1996M02 2.4
1996M03 1.6
1996M04 1.7
1996M05 2.2
1996M06 0.8
...
Slovakia 2021M04 0.2
2021M05 0.6
2021M06 0.4
2021M07 0.5
2021M08 0.2
Length: 921, dtype: float64
现在stack
和unstack
你可以根据自己的意愿透视维度:
>>> val.unstack('geo')
geo Hungary Poland Slovakia
unit coicop time
Monthly rate of change All-items HICP 1996M02 2.4 1.4 0.3
1996M03 1.6 1.6 0.2
1996M04 1.7 2.0 0.3
1996M05 2.2 1.3 0.5
1996M06 0.8 0.9 0.2
... ... ... ...
2021M04 0.8 0.7 0.2
2021M05 0.6 0.4 0.6
2021M06 0.4 0.1 0.4
2021M07 0.6 0.4 0.5
2021M08 0.2 0.3 0.2
[307 rows x 3 columns]