使用相关键规范化 json



我有这样的JSON:

in_str='''{
"prices": [
[1, 10],
[2, 20],
[3, 30]    
],
"total_volumes": [
[1, 100],
[2, 200],
[3, 300]
]
}'''

我正在尝试生成具有 3 列(id、价格、交易量(的熊猫数据框:

1 10 100
2 20 200
3 30 300

我尝试使用pandas.read_json(),但这给了我两列,我不知道下一步该去哪里。json_normalize()只给我一行两列。

import pandas as pd
import json
from pandas.io.json import json_normalize
in_str='''{
"prices": [
[1, 10],
[2, 20],
[3, 30]    
],
"total_volumes": [
[1, 10],
[2, 20],
[3, 30]
]
}'''
df = pd.read_json(in_str)
json_normalize(json.loads(in_str))

我们可以像往常一样阅读,重新转换并合并:

df = pd.read_json(in_str)
pd.merge(*[pd.DataFrame(np.array(df[col].to_list()), 
columns=['id', col]) for col in df],
on='id')

输出:

id  prices  total_volumes
0   1      10            100
1   2      20            200
2   3      30            300
pd.read_json

没有适合此结构的orient参数。看起来转换 json 并直接构造数据帧更容易:

>>> pd.DataFrame({key: dict(value) for key, value in json.loads(in_str).items()})
prices  total_volumes
1      10            100
2      20            200
3      30            300

您可以将 json 预处理为适当的字典并使用它来构造数据帧

import ast
d = ast.literal_eval(in_str)
d1 = {k: dict(v) for k, v in d.items()}
df = pd.DataFrame(d1).rename_axis('id').reset_index()
Out[857]:
id  prices  total_volumes
0   1      10            100
1   2      20            200
2   3      30            300

如果您还没有使用它,您不一定需要下载pandas。这是一个解决方案,它使用内置的json解析器来读取文件,并使用本机数据结构将数据处理成所需的形状(也许还有更有用的形状(。

import json
in_str='''{
"prices": [
[1, 10],
[2, 20],
[3, 30]    
],
"total_volumes": [
[1, 100],
[2, 200],
[3, 300]
]
}'''
in_json = json.loads(in_str)
# you can use json.load(filename) if you're using a file here.
print(in_json)
'''
>>> {'prices': [[1, 10], [2, 20], [3, 30]], 'total_volumes': [[1, 100], [2, 200], [3, 300]]}
'''
# Here we're going to merge the two data sets to make them iterable in one go.
inventory = dict()
for item_id, price in in_json["prices"]:
inventory[item_id] = {"price": price}
for item_id, volume in in_json["total_volumes"]:
if isinstance(inventory.get(item_id), dict):
inventory[item_id]["volume"] = volume
else:
inventory[item_id] = {"volume": volume}
print(inventory)
'''
>>> {1: {'price': 10, 'volume': 100}, 2: {'price': 20, 'volume': 200}, 3: {'price': 30, 'volume': 300}}
'''
# Now that the data is all in one dict, we can just iterate through it to get the rows in the shape that you want.
inventory_table = list()
for item_id, info in inventory.items():
row = [item_id, info.get("price"), info.get("volume")]
print(row)
'''
>>> [1, 10, 100]
>>> [2, 20, 200]
>>> [3, 30, 300]
'''
inventory_table.append(row)
# the final form
print(inventory_table)
'''
>>> [[1, 10, 100], [2, 20, 200], [3, 30, 300]]
'''

现在我们有了这个作为基线,我们可以制作一些人们在 Python 中流口水的单行代码:

import json
in_str='''{
"prices": [
[1, 10],
[2, 20],
[3, 30]    
],
"total_volumes": [
[1, 100],
[2, 200],
[3, 300]
]
}'''
in_json = json.loads(in_str)
inventory = {item_id: {"price": price} for item_id, price in in_json["prices"]}
for item_id, volume in in_json["total_volumes"]:
if isinstance(inventory.get(item_id), dict):
inventory[item_id]["volume"] = volume
else:
inventory[item_id] = {"volume": volume}
print(inventory)
inventory_table = [[item_id, info.get("price"), info.get("volume")] for item_id, info in inventory.items()]
print(inventory_table)
'''
>>> [[1, 10, 100], [2, 20, 200], [3, 30, 300]]
'''

最新更新