我如何从json对象的嵌套列表中提取由值识别的列表?



非常感谢您的帮助。我使用json_normalize来访问嵌套列表"术语"并提取"目标"。和";medium">

with open('file_name.txt','r', encoding = 'utf8') as f:
data=json.loads(f.read())
df = pd.json_normalize(data, record_path = ['terms'],meta=['objectid','medium']) 
print(df)

不提取所有列表我只想提取术语类型为"分类"的列表。和"Materials".

"medium": "Handscroll; ink on paper",      
"terms": [{
"term": "Chinese",
"aatid": 300018322,
"id": 2033355,
"termtype": "Culture"
},
{
"term": "scroll paintings",
"aatid": 300033666,
"id": 2053034,
"termtype": "Classification"
},
{
"term": "handscrolls",
"aatid": 300178463,
"id": 2053038,
"termtype": "Classification"
},
{
"term": "tigers",
"aatid": null,
"id": 2127399,
"termtype": "Subject"
},
{
"term": "paper (fiber product)",
"aatid": 300014109,
"id": 2043813,
"termtype": "Materials"
},
{
"term": "ink",
"aatid": 300015012,
"id": 2167672,
"termtype": "Materials"
}
],
"objectid": "138361",

我想打印:

term        aatid       id        termtype objectid                    medium
handscrolls  300178463.0  2053038  Classification   138361  Handscroll; ink on paper
scroll paintings  300033666.0  2053034  Classification   138361  Handscroll; ink on paper
paper (fiber product)  300014109.0  2043813       Materials   138361  Handscroll; ink on paper
ink  300015012.0  2167672       Materials   138361  Handscroll; ink on paper

如果一切正常,这应该可以完成工作。您的数据不是一个有效的json。首先使用https://jsonlint.com/

验证json
df.loc[(df.termtype == "Classification") | (df.termtype == "Materials")]

最新更新