我有一个嵌套字典,它的第一级键是[0,1,2…]],每个键对应的值格式为:
{
"geometry": {
"type": "Point",
"coordinates": [75.4516454, 27.2520587]
},
"type": "Feature",
"properties": {
"state": "Rajasthan",
"code": "BDHL",
"name": "Badhal",
"zone": "NWR",
"address": "Kishangarh Renwal, Rajasthan"
}
}
我想创建一个熊猫数据框架的形式:
Geometry Type Properties
Type Coordinates State Code Name Zone Address
0 Point [..., ...] Features Rajasthan BDHL ... ... ...
1
2
我不能理解网络上关于多索引/嵌套数据框架/旋转的例子。它们似乎都不把一级键作为所需数据框中的主索引。
我如何从数据我有,使它成为这个格式化的数据框架?
我建议将列创建为"geometry_type"
,"geometry_coord"
等。为了将这些列与你命名为"type"
的列区分开来。换句话说,使用第一个键作为前缀,并使用子键作为名称,从而创建一个新名称。然后,像这样解析并填充你的Dataframe
import json
j = json.loads("your_json.json")
df = pd.DataFrame(columns=["geometry_type", "geometry_coord", ... ])
for k, v in j.items():
if k == "geometry":
df = df.append({
"geometry_type": v.get("type"),
"geometry_coord": v.get("coordinates")
}, ignore_index=True)
...
输出可以是这样的:
geometry_type geometry_coord ...
0 [75.4516454, 27.2520587] NaN ...
PS:如果你真的想要你最初的选择,你可以在这里检查:给一个列多个索引/标题
我想你有一个嵌套字典的列表。
使用json_normalize
读取json数据,并使用str.partition
将当前列索引拆分为2部分:
import pandas as pd
import json
data = json.load(open('data.json'))
df = pd.json_normalize(data)
df.columns = df.columns.str.partition('.', expand=True).droplevel(level=1)
输出:
>>> df.columns
MultiIndex([( 'type', ''),
( 'geometry', 'type'),
( 'geometry', 'coordinates'),
('properties', 'state'),
('properties', 'code'),
('properties', 'name'),
('properties', 'zone'),
('properties', 'address')],
)
>>> df
type geometry properties
type coordinates state code name zone address
0 Feature Point [75.4516454, 27.2520587] Rajasthan BDHL Badhal NWR Kishangarh Renwal, Rajasthan
您可以使用pd.json_normalize()
将嵌套字典规范化为数据帧df
。
,将列名用点分成多索引与Index.str.split
df.columns
expand=True
与参数,如下:
步骤1:将嵌套字典规范化为数据帧
j = {
"geometry": {
"type": "Point",
"coordinates": [75.4516454, 27.2520587]
},
"type": "Feature",
"properties": {
"state": "Rajasthan",
"code": "BDHL",
"name": "Badhal",
"zone": "NWR",
"address": "Kishangarh Renwal, Rajasthan"
}
}
df = pd.json_normalize(j)
第1步结果:
print(df)
type geometry.type geometry.coordinates properties.state properties.code properties.name properties.zone properties.address
0 Feature Point [75.4516454, 27.2520587] Rajasthan BDHL Badhal NWR Kishangarh Renwal, Rajasthan
步骤2:创建多索引列标签
df.columns = df.columns.str.split('.', expand=True)
step2 (Final) Result:
print(df)
type geometry properties
NaN type coordinates state code name zone address
0 Feature Point [75.4516454, 27.2520587] Rajasthan BDHL Badhal NWR Kishangarh Renwal, Rajasthan