如何从嵌套的json文件创建pandas数据帧



我有一个json文件,如下所示:

{
"data": {
"success": true,
"timeseries": true,
"start_date": "2022-10-01",
"end_date": "2022-10-04",
"base": "EUR",
"rates": {
"2022-10-01": {
"NG": 0.1448939471560284
},
"2022-10-02": {
"NG": 0.14487923291390148
},
"2022-10-03": {
"NG": 0.1454857922753868
},
"2022-10-04": {
"NG": 0.1507352356663182
}
},
"unit": "per MMBtu"
}
}

我想创建一个数据帧,如下所示:

Date        NG        base 
2022-10-01  0.144894  EUR
2022-10-02  0.144879  EUR
2022-10-03  0.145486  EUR
2022-10-04  0.150735  EUR

这是我尝试过的:

with open(r'C:UsersEHDesktoptoolsjson_filesblue_file.json','r') as f:
data = json.loads(f.read())
df1 = pd.DataFrame(data['data']['rates'])
df1 = df1.T
df2 = pd.DataFrame(data['data'])
df2 = df2.base
merge = [df1, df2]
df3 = pd.concat(merge)
print(df3)

我的当前输出:

NG    0
2022-10-01  0.144894  NaN
2022-10-02  0.144879  NaN
2022-10-03  0.145486  NaN
2022-10-04  0.150735  NaN
2022-10-01       NaN  EUR
2022-10-02       NaN  EUR
2022-10-03       NaN  EUR
2022-10-04       NaN  EUR

正如你所看到的,有些事情进展不顺利,我不明白NaN是从哪里来的。

假设文件存储在一个名为data 的变量中

data = { "data": { "success": True, "timeseries": True, "start_date": "2022-10-01", "end_date": "2022-10-04", "base": "EUR", "rates": { "2022-10-01": { "NG": 0.1448939471560284 }, "2022-10-02": { "NG": 0.14487923291390148 }, "2022-10-03": { "NG": 0.1454857922753868 }, "2022-10-04": { "NG": 0.1507352356663182 } }, "unit": "per MMBtu" } }

首先,将数据读取到具有pandas.DataFrame.from_dict的数据帧,如下所示

df = pd.DataFrame.from_dict(data['data']['rates'], orient='index')
[Out]:
NG
2022-10-01  0.144894
2022-10-02  0.144879
2022-10-03  0.145486
2022-10-04  0.150735

但是,由于不希望带日期的列成为索引,并且希望该列的名称为Date,因此将从使用pandas.DataFrame.reset_index重置索引开始

df = df.reset_index()
[Out]:
index        NG
0  2022-10-01  0.144894
1  2022-10-02  0.144879
2  2022-10-03  0.145486
3  2022-10-04  0.150735

然后可以将列重命名为所需的输出格式pandas.DataFrame.rename

df = df.rename(columns={'index':'Date'})
[Out]:
Date        NG
0  2022-10-01  0.144894
1  2022-10-02  0.144879
2  2022-10-03  0.145486
3  2022-10-04  0.150735

由于它仍然缺少基柱,可以从data中提取,如下所示

df['base'] = data['data']['base']
[Out]:
Date        NG base
0  2022-10-01  0.144894  EUR
1  2022-10-02  0.144879  EUR
2  2022-10-03  0.145486  EUR
3  2022-10-04  0.150735  EUR

即使已经有了所需的输出,也可以将所有内容包装成一行,如下所示

df = pd.DataFrame.from_dict(data['data']['rates'], orient='index').reset_index().rename(columns={'index':'Date'}).assign(base=data['data']['base'])
[Out]:
Date        NG base
0  2022-10-01  0.144894  EUR
1  2022-10-02  0.144879  EUR
2  2022-10-03  0.145486  EUR
3  2022-10-04  0.150735  EUR

使用您所做的操作,您只需要指定要在其上连接两个数据帧的轴,默认情况下,这两个数据框架一个接一个地连接。

pd.concat(merge, axis=1) # default 0

JSON是一种非常灵活的格式,panda中没有任何函数可以解析所有类型的JSON。在构建数据帧之前,您需要对JSON进行预处理:

with open("data.json") as fp:
data = json.load(fp)
df = pd.DataFrame(
[(date, rate["NG"]) for date, rate in data["data"]["rates"].items()],
columns=["Date", "NG"],
).assign(base=data["data"]["base"])

你差不多到了。获得NaNs的原因是在连接数据帧时需要指定不同的轴:pd.concat(merge, axis=1)

这是我的版本:

import json
import pandas as pd
with open("blue_file.json", "r") as f:
data = json.loads(f.read())
df = pd.DataFrame.from_dict(data["data"]["rates"], orient="index")
df["base"] = data["data"]["base"]
df

输出:

NG base
2022-10-01  0.144894  EUR
2022-10-02  0.144879  EUR
2022-10-03  0.145486  EUR
2022-10-04  0.150735  EUR

简单地说:

import json
with open(r'path.json','r') as f:
data = json.loads(f.read())
df=pd.DataFrame(data['data']).reset_index()
df=df.join(pd.json_normalize(df.pop('rates')))
print(df)
df=df[['index','NG','base']]
df=df.rename(columns={'index':'Date'})
'''
index      success  timeseries  start_date  end_date    base    unit    NG
0   2022-10-01  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1448939471560284
1   2022-10-02  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.14487923291390148
2   2022-10-03  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1454857922753868
3   2022-10-04  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1507352356663182
'''

最新更新