从json输入中提取讲师的课程id

  • 本文关键字:id 提取 json python json
  • 更新时间 :
  • 英文 :


假设我有一个嵌套JSON文件,如下所示。如果我想打印每个老师教的课程,我该怎么做?

{
"info":{
"source_objects":[
{
"type":"sub-category",
"id":277438897,
}
],
"item_type":"course",
"items":[
{
"_class":"course",
"id":156173119,
"is_paid":null,
"trainer":[
{
"id":257585701,
"url":"/user/tania_guerra/",
}
], 

{
"_class":"course",
"id":12456,
"is_paid":null,
"trainer":[
{
"id":257585701,
"url":"/user/tania_guerra/",
}
], 
} 
*************and more data on the same format****************
} 
} 

我不确定是否有什么简单的技巧我错过了。到目前为止,我已经尝试了以下方法,它打印了课程id和培训师id。但是我如何添加这个培训师所培训的所有课程呢?

with open (alljson, 'r') as json: # alljson is a directory where multiple json file exists 
read_json = json.load(json) 

for i in ange(int(len(read_all_json['info']['items']))): 
cid = read_json['info']['items'][i]['id'] # gets the course id 
for j in range(int(len(read_json['info']['items'][i]['trainer']))) 
trainer_id = read_json['info']['items'][i]['trainer'][j]['id'] # gets the trainer id 

# then how do I get course id added to trainer id. for example 
# 12456---123456***123457***123454***12454 
# trainer id--- all the courses that this instructor teaches addind *** 

假设每个培训师都有一个唯一的id,您可以创建一个列表字典,其中键是培训师id,值是课程id列表:

import os, json
rootdir = 'tmp/test1'
trainers = {}
for root, dirs, files in os.walk(rootdir):
for filename in files:
if os.path.splitext(filename)[1] != '.json':
continue
filepath = os.path.join(root, filename)
with open(filepath) as stream:
data = json.load(stream)
for item in data['info']['items']:
cid = item['id']
for trainer in item['trainer']:
key = (trainer['id'], trainer['url'])
if key not in trainers:
trainers[key] = []
trainers[key].append(str(cid))
output = 'trainers.txt'
with open(output, 'w') as stream:
for (tid, url), cids in sorted(trainers.items()):
stream.write('%s---%s---%sn' % (tid, url, ';;;'.join(cids)))

结果:

257585701---/user/tania_guerra/---12456;;;7992450;;;7812756;;;156173119;;;562456
918585703---/user/tania_guerra/---7867833;;;14473169;;;156173119

test.json:

{
"info": {
"source_objects": [
{
"type": "sub-category",
"id": 277438897
}
],
"item_type": "course",
"items": [
{
"_class": "course",
"id": 156173119,
"is_paid": null,
"trainer": [
{
"id": 257585701,
"url": "/user/tania_guerra/"
}
]
},
{
"_class": "course",
"id": 12456,
"is_paid": null,
"trainer": [
{
"id": 257585701,
"url": "/user/tania_guerra/"
}
]
}
]
}
}

我认为使用dict或更好的defaultdict[int->List[int]]是最容易的

之类的

from collections import defaultdict
with open(alljson, "r") as json:
items = json["info"]["items"]
trainer_course_mapping = defaultdict(list)
for item in items:
trainers = item["trainer"]
for trainer in trainers:
trainer_course_mapping[trainer["id"]].append(item["id"])

最新更新