如何从一个嵌套的字典创建一个多索引数据框架?



我有一个嵌套字典,它的第一级键是[0,1,2…]],每个键对应的值格式为:

{
"geometry": {
"type": "Point",
"coordinates": [75.4516454, 27.2520587]
},
"type": "Feature",
"properties": {
"state": "Rajasthan",
"code": "BDHL",
"name": "Badhal",
"zone": "NWR",
"address": "Kishangarh Renwal, Rajasthan"
}
} 

我想创建一个熊猫数据框架的形式:

Geometry           Type                    Properties
Type      Coordinates           State     Code    Name    Zone    Address
0  Point     [..., ...]   Features Rajasthan BDHL    ...     ...     ...
1
2

我不能理解网络上关于多索引/嵌套数据框架/旋转的例子。它们似乎都不把一级键作为所需数据框中的主索引。

我如何从数据我有,使它成为这个格式化的数据框架?

我建议将列创建为"geometry_type","geometry_coord"等。为了将这些列与你命名为"type"的列区分开来。换句话说,使用第一个键作为前缀,并使用子键作为名称,从而创建一个新名称。然后,像这样解析并填充你的Dataframe

import json
j = json.loads("your_json.json")
df = pd.DataFrame(columns=["geometry_type", "geometry_coord", ... ])
for k, v in j.items():
if k == "geometry":
df = df.append({
"geometry_type": v.get("type"),
"geometry_coord": v.get("coordinates")
}, ignore_index=True)
...

输出可以是这样的:

geometry_type               geometry_coord    ...
0   [75.4516454, 27.2520587]    NaN               ...

PS:如果你真的想要你最初的选择,你可以在这里检查:给一个列多个索引/标题

我想你有一个嵌套字典的列表

使用json_normalize读取json数据,并使用str.partition将当前列索引拆分为2部分:

import pandas as pd
import json
data = json.load(open('data.json'))
df = pd.json_normalize(data)
df.columns = df.columns.str.partition('.', expand=True).droplevel(level=1)

输出:

>>> df.columns
MultiIndex([(      'type',            ''),
(  'geometry',        'type'),
(  'geometry', 'coordinates'),
('properties',       'state'),
('properties',        'code'),
('properties',        'name'),
('properties',        'zone'),
('properties',     'address')],
)
>>> df
type geometry                           properties                     
type               coordinates      state  code    name zone                        address   
0  Feature    Point  [75.4516454, 27.2520587]  Rajasthan  BDHL  Badhal  NWR   Kishangarh Renwal, Rajasthan

您可以使用pd.json_normalize()将嵌套字典规范化为数据帧df

,将列名用点分成多索引与Index.str.splitdf.columnsexpand=True与参数,如下:

步骤1:将嵌套字典规范化为数据帧

j = {
"geometry": {
"type": "Point",
"coordinates": [75.4516454, 27.2520587]
},
"type": "Feature",
"properties": {
"state": "Rajasthan",
"code": "BDHL",
"name": "Badhal",
"zone": "NWR",
"address": "Kishangarh Renwal, Rajasthan"
}
} 
df = pd.json_normalize(j)

第1步结果:

print(df)
type geometry.type      geometry.coordinates properties.state properties.code properties.name properties.zone            properties.address
0  Feature         Point  [75.4516454, 27.2520587]        Rajasthan            BDHL          Badhal             NWR  Kishangarh Renwal, Rajasthan

步骤2:创建多索引列标签

df.columns = df.columns.str.split('.', expand=True)

step2 (Final) Result:

print(df)
type geometry                           properties                                                 
NaN     type               coordinates      state  code    name zone                       address
0  Feature    Point  [75.4516454, 27.2520587]  Rajasthan  BDHL  Badhal  NWR  Kishangarh Renwal, Rajasthan

相关内容

  • 没有找到相关文章

最新更新