如何解析和平面化嵌套JSON API响应为表格格式



JSON结构:

{
"help": "https://data.boston.gov/api/3/action/help_show?name=datastore_search_sql",
"success": true,
"result": {
"records": [
{
"latitude": "42.38331999978103",
"property_type": "Residential 1-family",
"neighborhood": "Charlestown",
"description": "Improper storage trash: res",
"year built": "1885",
"_full_text": "'-11':2 '-23':3 '-71.06920000136572':29 '00':4,5,6 '02129':16 '1':26 '107':13 '1885':23 '201340000':19 '2017':24 '2022':1 '2129':18 '42.38331999978103':28 'baldwin':14 'charlestown':17 'enforcement':7 'family':27 'improper':9 'lia':20 'res':12 'residential':25 'ryan':21 'st':15 'storage':10 'trash':11 'v':22 'violations':8",
"longitude": "-71.06920000136572",
"owner": "LIA RYAN V",
"address": "107 Baldwin St, 02129",
"date": "2022-11-23T00:00:00",
"violation_type": "Enforcement Violations",
"_id": 1,
"year remodeled": "2017",
"parcel": "201340000",
"zip_code": "2129"
},
{
"latitude": "42.32762329872878",
"property_type":
...}
],
"fields": [
{
"type": "int4",
"id": "_id"
},
{
"type": "tsvector"...
}
],
"sql": "SELECT * from "dc615ff7-2ff3-416a-922b-f0f334f085d0" where date >= '2022-11-23'"
}
}

从Boston.gov网站收到API响应:

response = requests.request('GET', 'https://data.boston.gov/api/3/action/datastore_search_sql?sql=SELECT%20*%20from%20%22dc615ff7-2ff3-416a-922b-f0f334f085d0%22%20where%20date%20%3E=%20%272022-11-23%27')

所以有5个顶级键,但我只关心得到结果。记录成表格格式

相关字典(result.records)中的键:

json_data['result']['records'][0].keys()
dict_keys(['latitude', 'property_type', 'neighborhood', 'description', 'year built', '_full_text', 'longitude', 'owner', 'address', 'date', 'violation_type', '_id', 'year remodeled', 'parcel', 'zip_code'])

我得到的最接近的是使用扁平化json模块的flatten()的1x52数据帧,但这只是有每个结果。在一个单独的列中记录字典。

0   ...                                                 51
0  {'latitude': '42.38331999978103', 'property_ty...  ...  {'latitude': '42.38306999993893', 'property_ty...

使用json_normalize(两次)

with open(extracted_data_fn) as json_file:
# store file data in object
json_data = json.load(json_file)
print (json_data)
# using flatten_json module
flat_json = flatten_json.flatten(json_data)
df_flat = pd.DataFrame(flat_json, index = range(len(flat_json)))
df = pd.json_normalize(json_data)
df_result_records = pd.json_normalize(df['result.records'])
df_result_records

我的首选输出是键作为列,每个值作为行中的一个单元格。

对如何实现这一点有什么想法吗?谢谢你!

直接使用

json_data= response.json()
df=pd.json_normalize(json_data['result']['records'])
df
|    |   latitude | property_type        | neighborhood   | description                 |   year built | _full_text                                                                                                                                                                                                                                                                                                                           |   longitude | owner      | address               | date                | violation_type         |   _id |   year remodeled |       parcel |   zip_code |
|---:|-----------:|:---------------------|:---------------|:----------------------------|-------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------:|:-----------|:----------------------|:--------------------|:-----------------------|------:|-----------------:|-------------:|-----------:|
|  0 |    42.3833 | Residential 1-family | Charlestown    | Improper storage trash: res |         1885 | '-11':2 '-23':3 '-71.06920000136572':29 '00':4,5,6 '02129':16 '1':26 '107':13 '1885':23 '201340000':19 '2017':24 '2022':1 '2129':18 '42.38331999978103':28 'baldwin':14 'charlestown':17 'enforcement':7 'family':27 'improper':9 'lia':20 'res':12 'residential':25 'ryan':21 'st':15 'storage':10 'trash':11 'v':22 'violations':8 |    -71.0692 | LIA RYAN V | 107 Baldwin St, 02129 | 2022-11-23T00:00:00 | Enforcement Violations |     1 |             2017 |   2.0134e+08 |       2129 |
|  1 |    42.3276 | Ellipsis             | nan            | nan                         |          nan | nan                                                                                                                                                                                                                                                                                                                                  |    nan      | nan        | nan                   | nan                 | nan                    |   nan |              nan | nan          |        nan |

相关内容

  • 没有找到相关文章

最新更新