我有一个json文件,如下所示:
{
"data": {
"success": true,
"timeseries": true,
"start_date": "2022-10-01",
"end_date": "2022-10-04",
"base": "EUR",
"rates": {
"2022-10-01": {
"NG": 0.1448939471560284
},
"2022-10-02": {
"NG": 0.14487923291390148
},
"2022-10-03": {
"NG": 0.1454857922753868
},
"2022-10-04": {
"NG": 0.1507352356663182
}
},
"unit": "per MMBtu"
}
}
我想创建一个数据帧,如下所示:
Date NG base
2022-10-01 0.144894 EUR
2022-10-02 0.144879 EUR
2022-10-03 0.145486 EUR
2022-10-04 0.150735 EUR
这是我尝试过的:
with open(r'C:UsersEHDesktoptoolsjson_filesblue_file.json','r') as f:
data = json.loads(f.read())
df1 = pd.DataFrame(data['data']['rates'])
df1 = df1.T
df2 = pd.DataFrame(data['data'])
df2 = df2.base
merge = [df1, df2]
df3 = pd.concat(merge)
print(df3)
我的当前输出:
NG 0
2022-10-01 0.144894 NaN
2022-10-02 0.144879 NaN
2022-10-03 0.145486 NaN
2022-10-04 0.150735 NaN
2022-10-01 NaN EUR
2022-10-02 NaN EUR
2022-10-03 NaN EUR
2022-10-04 NaN EUR
正如你所看到的,有些事情进展不顺利,我不明白NaN是从哪里来的。
假设文件存储在一个名为data 的变量中
data = { "data": { "success": True, "timeseries": True, "start_date": "2022-10-01", "end_date": "2022-10-04", "base": "EUR", "rates": { "2022-10-01": { "NG": 0.1448939471560284 }, "2022-10-02": { "NG": 0.14487923291390148 }, "2022-10-03": { "NG": 0.1454857922753868 }, "2022-10-04": { "NG": 0.1507352356663182 } }, "unit": "per MMBtu" } }
首先,将数据读取到具有pandas.DataFrame.from_dict
的数据帧,如下所示
df = pd.DataFrame.from_dict(data['data']['rates'], orient='index')
[Out]:
NG
2022-10-01 0.144894
2022-10-02 0.144879
2022-10-03 0.145486
2022-10-04 0.150735
但是,由于不希望带日期的列成为索引,并且希望该列的名称为Date
,因此将从使用pandas.DataFrame.reset_index
重置索引开始
df = df.reset_index()
[Out]:
index NG
0 2022-10-01 0.144894
1 2022-10-02 0.144879
2 2022-10-03 0.145486
3 2022-10-04 0.150735
然后可以将列重命名为所需的输出格式pandas.DataFrame.rename
df = df.rename(columns={'index':'Date'})
[Out]:
Date NG
0 2022-10-01 0.144894
1 2022-10-02 0.144879
2 2022-10-03 0.145486
3 2022-10-04 0.150735
由于它仍然缺少基柱,可以从data
中提取,如下所示
df['base'] = data['data']['base']
[Out]:
Date NG base
0 2022-10-01 0.144894 EUR
1 2022-10-02 0.144879 EUR
2 2022-10-03 0.145486 EUR
3 2022-10-04 0.150735 EUR
即使已经有了所需的输出,也可以将所有内容包装成一行,如下所示
df = pd.DataFrame.from_dict(data['data']['rates'], orient='index').reset_index().rename(columns={'index':'Date'}).assign(base=data['data']['base'])
[Out]:
Date NG base
0 2022-10-01 0.144894 EUR
1 2022-10-02 0.144879 EUR
2 2022-10-03 0.145486 EUR
3 2022-10-04 0.150735 EUR
使用您所做的操作,您只需要指定要在其上连接两个数据帧的轴,默认情况下,这两个数据框架一个接一个地连接。
pd.concat(merge, axis=1) # default 0
JSON是一种非常灵活的格式,panda中没有任何函数可以解析所有类型的JSON。在构建数据帧之前,您需要对JSON进行预处理:
with open("data.json") as fp:
data = json.load(fp)
df = pd.DataFrame(
[(date, rate["NG"]) for date, rate in data["data"]["rates"].items()],
columns=["Date", "NG"],
).assign(base=data["data"]["base"])
你差不多到了。获得NaN
s的原因是在连接数据帧时需要指定不同的轴:pd.concat(merge, axis=1)
。
这是我的版本:
import json
import pandas as pd
with open("blue_file.json", "r") as f:
data = json.loads(f.read())
df = pd.DataFrame.from_dict(data["data"]["rates"], orient="index")
df["base"] = data["data"]["base"]
df
输出:
NG base
2022-10-01 0.144894 EUR
2022-10-02 0.144879 EUR
2022-10-03 0.145486 EUR
2022-10-04 0.150735 EUR
简单地说:
import json
with open(r'path.json','r') as f:
data = json.loads(f.read())
df=pd.DataFrame(data['data']).reset_index()
df=df.join(pd.json_normalize(df.pop('rates')))
print(df)
df=df[['index','NG','base']]
df=df.rename(columns={'index':'Date'})
'''
index success timeseries start_date end_date base unit NG
0 2022-10-01 true true 2022-10-01 2022-10-04 EUR per MMBtu 0.1448939471560284
1 2022-10-02 true true 2022-10-01 2022-10-04 EUR per MMBtu 0.14487923291390148
2 2022-10-03 true true 2022-10-01 2022-10-04 EUR per MMBtu 0.1454857922753868
3 2022-10-04 true true 2022-10-01 2022-10-04 EUR per MMBtu 0.1507352356663182
'''