下面是我的JSON示例。当我将JSON转换为CSV文件时,它为评论数组的每个对象创建不同的列。列名称类似于-串行名称。0评级。0 _id。0的名字。1评级。1 _id.1。我如何才能转换到CSV文件,其中只有串行,名称,评级,_id将是列名和评论的每个对象将放在不同的行?
[{
"serial": "63708940a8d291c502be815f",
"reviews": [
{
"name": "shadman",
"rating": 4,
"_id":"6373d4eb50cff661989f3d83"
},
{
"name": "niloy1",
"rating": 3,
"_id": "6373d59450cff661989f3db8"
},
],
}]
我正在尝试使用CSV文件来熊猫。如果不可能,有没有办法解决这个问题,使用pandas包在python?
我建议您仅将pandas用于CSV导出,并首先通过平坦化数据结构来处理json数据,以便可以轻松地将结果加载到pandas DataFrame中。
试题:
data_python = [{
"serial": "63708940a8d291c502be815f",
"reviews": [
{
"name": "shadman",
"rating": 4,
"_id":"6373d4eb50cff661989f3d83"
},
{
"name": "niloy1",
"rating": 3,
"_id": "6373d59450cff661989f3db8"
},
],
}]
from collections import defaultdict
from pprint import pprint
import pandas as pd
dct_flat = defaultdict(list)
for dct in data_python:
for dct_reviews in dct["reviews"]:
dct_flat['serial'].append(dct['serial'])
for key, value in dct_reviews.items():
dct_flat[key].append(value)
#pprint(data_python)
#pprint(dct_flat)
df = pd.DataFrame(dct_flat)
print(df)
df.to_csv("data.csv")
给了
:
serial name rating _id
0 63708940a8d291c502be815f shadman 4 6373d4eb50cff661989f3d83
1 63708940a8d291c502be815f niloy1 3 6373d59450cff661989f3db8
和
,serial,name,rating,_id
0,63708940a8d291c502be815f,shadman,4,6373d4eb50cff661989f3d83
1,63708940a8d291c502be815f,niloy1,3,6373d59450cff661989f3db8
为CSV文件内容。
请注意,您在问题中提供的json不能从Python中的文件或字符串中加载,无论是使用Python json模块还是使用Pandas,因为它不是有效的json代码。查看下面更正的有效json数据:
valid_json_data='''
[{
"serial": "63708940a8d291c502be815f",
"reviews": [
{
"name": "shadman",
"rating": 4,
"_id":"6373d4eb50cff661989f3d83"
},
{
"name": "niloy1",
"rating": 3,
"_id": "6373d59450cff661989f3db8"
}
]
}]
'''
和从json文件加载这些数据的代码:
import json
json_file = "data.json"
with open(json_file) as f:
data_json = f.read()
data_python = json.loads(data_json)