我有一个由1m的数据组成的JSON文件。我想根据id提取技能,这样每个id都有不同的技能。有人能告诉我如何从JSON文件中提取技能列并将其转换为数据框架吗?我只需要技能列表中三列中的技能。
我从JSON文件中附加了几行。
{
"id": "3d86309e-64f6-4df8-ba60-cce431870bfb",
"location": {
"city": "Assen",
"country": "Netherlands",
"longitude": 6.564228534698486,
"latitude": 52.99275207519531
},
"educations": [
{
"title": "Bachelor Bedrijfskundige Informatica",
"institution": "Hanzehogeschool Groningen / Hanze University of Applied Sciences Groningen",
"start_date": "2002-01-01",
"end_date": "2005-12-31",
"ongoing": false,
"edu_type_id": 16870377,
"edu_cat_id": 951006,
"level": 11
},
{
"title": "Bachelor Public Relations, Marketing, Communication",
"institution": "NHL Hogeschool",
"start_date": "1994-01-01",
"end_date": "1997-12-31",
"ongoing": false,
"edu_type_id": 953096,
"edu_cat_id": 951099,
"level": 11
},
{
"title": " ",
"institution": "Gomarus College",
"start_date": null,
"end_date": null,
"ongoing": false,
"edu_type_id": null,
"edu_cat_id": null,
"level": null
}
],
"work_experiences": [
{
"title": "medewerker ICT business development",
"company_name": "Woningcorporatie Actium",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2014-10-01",
"end_date": null,
"classification": {
"function_type": "Sales, business development and key account managers",
"function_type_id": 1568086,
"function_cat": "Sales, account and business development managers and representatives",
"function_cat_id": 1567386,
"level": 5
}
},
{
"title": "Functioneel Applicatiebeheerder",
"company_name": "Actium Assen",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2013-02-01",
"end_date": "2014-09-30",
"classification": {
"function_type": "Ict service and information managers",
"function_type_id": 1567953,
"function_cat": "Ict service and information managers",
"function_cat_id": 1567269,
"level": 5
}
},
{
"title": "Change Coördinator",
"company_name": "KPN Consulting",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2006-08-01",
"end_date": "2012-09-30",
"classification": {
"function_type": "Securities and finance dealers and brokers",
"function_type_id": 1567651,
"function_cat": "Finance, securities and investment staff",
"function_cat_id": 1567434,
"level": 6
}
},
{
"title": "Coordinator Automatisering",
"company_name": "Spinder Products",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2000-01-01",
"end_date": "2006-12-31",
"classification": {
"function_type": "Ict service and information managers",
"function_type_id": 1567953,
"function_cat": "Ict service and information managers",
"function_cat_id": 1567269,
"level": 6
}
}
],
"skills": [
{
"skill": "business development",
"skill_id": 972528,
"skill_type_id": 34097811
},
{
"skill": "Automatisering",
"skill_id": 1588585,
"skill_type_id": 954000
}
],
"languages": [
{
"language": "Dutch",
"proficiency": "native or bilingual proficiency"
},
{
"language": "English",
"proficiency": "professional working proficiency"
},
{
"language": "German",
"proficiency": "elementary proficiency"
}
],
"certificates": [],
"working_years": 20
}
我希望输出的格式为:
skill
business development,Automatisering
我从你的问题中得出结论,你只想要数据框架中每种技能的名称。
如果JSON在文件"data.json">
中,下面的代码将得到它from pandas import DataFrame
import json
with open('data.json') as file:
data = DataFrame([skill["skill"] for skill in json.loads(file.read())["skills"]])
print(data)
将从DataFrame "data">
打印以下内容0
0 business development
1 Automatisering
数据框中的第0列是"技能"。但是如果您想要一个不同的列,例如"skill_id"只需将上面代码中的skill["skill"]
替换为skill["skill_id"]
。
如果您只想要所有三列并且不希望过滤掉任何列,则代码更短
from pandas import DataFrame
import json
with open('data.json') as file:
data = DataFrame(json.loads(file.read())["skills"])
print(data)
我不知道为什么你需要一个DataFrame而不是一个普通的列表,特别是考虑到你只是得到一个字符串列表。
如果你只是想要一个技能列表,你可以运行
with open('data.json') as file:
data = [skill["skill"] for skill in json.loads(file.read())["skills"]]
print(data)
我只是删除了与DataFrame相关的部分。
JSON。