将CSV数据转换为嵌套json格式



我有一些csv数据需要转换为特定的json格式。我写了一个代码,为一些嵌套的水平工作,但不是按要求

这是我的csv数据:
title   context answers question    id
tit1    con1    text1   que1    id1
tit1    con1    text2   que2    id2
tit2    con2    text3   que3    id3
tit2    con2    text4   que4    id4
tit2    con3    text5   que5    id5

我代码:

df = pd.read_csv('processedOutput.csv')
finalList = []
finalDict = {}
grouped = df.groupby(['context'])
for key, value in grouped:
dictionary = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['context'] = j.at[0, 'context']
dictList = []
anotherDict = {}
for i in j.index:
anotherDict['answers'] = j.at[i, 'answers']
anotherDict['question'] = j.at[i, 'question']
anotherDict['id'] = j.at[i, 'id']
dictList.append(anotherDict)
dictionary['qas'] = dictList
finalList.append(dictionary)
import json
data = json.dumps(finalList)

,其输出结构良好,但只取分组项的最后一个元素

[{"context": "con1",
"qas": [
{"answers": "text2", "question": "que2", "id": "id2"},
{"answers": "text2", "question": "que2", "id": "id2"}
]
},
{"context": "con2", 
"qas": [
{"answers": "text4", "question": "que4", "id": "id4"},
{"answers": "text4", "question": "que4", "id": "id4"}
]
},
{"context": "con3", 
"qas": [
{"answers": "text5", "question": "que5", "id": "id5"}
]
}
]

想让所有字段的数据嵌套一层,如下所示:

[
{
"title": "tit1",
"paragraph": [
{
"context": "con1",
"qas": [
{"answers": "text1","question": "que1","id": "id1"},
{"answers": "text2","question": "que2","id": "id2"}
]}]
},
{
"title": "tit2",
"paragraph": [
{
"context": "con2",
"qas": [
{"answers": "text3","question": "que3","id": "id3"},
{"answers": "text4","question": "que4","id": "id4"}
],
"context": "con3",
"qas": [
{"answers": "text5","question":"que5", "id": "id5"}
]
}
]
}
]  

在这个问题上坚持了很长时间,任何建议都会很好

您的输出数据需要3个级别的分组:标题、段落和q&a。我建议使用df.groupby(['title', 'context', 'answers'])来驱动环路。

然后,在循环中,每个组将由一个字典(假设)组成id列只包含唯一的值)。为了构建更高层次的结构,你所需要做的只是做一些记录来检测关卡变化,并将其添加到适当的列表和字典中。我们将使用更多的groupby级别来做到这一点:

...
g1 = df.groupby(['title'])
for k1, v1 in g1:
l2_para_list = []
l4_qas_list = []
g2 = v1.groupby(['context'])
for k2, v2 in g2:
g3 = v2.groupby(['answers'])
for _, v3 in g3:
qas_dict = {}
qas_dict['answers'] = v3.answers.item()
qas_dict['question'] = v3.question.item()
qas_dict['id'] = v3.id.item()
l4_qas_list.append(qas_dict)
l3_para_dict = {}
l3_para_dict['context'] = k2
l3_para_dict['qas'] = l4_qas_list
l4_qas_list = []
l2_para_list.append(l3_para_dict)
l3_para_dict = {}
l1_title_dict = {}
l1_title_dict['title'] = k1
l1_title_dict['paragraph'] = l2_para_list
finalList.append(l1_title_dict)
l1_title_dict = {}
l2_para_list = []
print(json.dumps(finalList))
...

输出(为显示而格式化)

[{"title": "tit1", "paragraph":
[{"context": "con1",
"qas": [{"answers": "text1", "question": "que1", "id": "id1"},
{"answers": "text2", "question": "que2", "id": "id2"}]}]},
{"title": "tit2", "paragraph":
[{"context": "con2",
"qas": [{"answers": "text3", "question": "que3", "id": "id3"},
{"answers": "text4", "question": "que4", "id": "id4"}]},
{"context": "con3",
"qas": [{"answers": "text5", "question": "que5", "id": "id5"}]}]}]

相关内容

  • 没有找到相关文章

最新更新