如何获得Python Pandas数据帧表格式的mongodb嵌套文档


地址
姓名 年龄
1 "Steve"> 27 {"数字":4,"街道":"主干道","城市":"牛津"}
2 ";亚当 32 {"数字":78,"街道":"高街","城市":"剑桥"}

您可以使用pd.DataFrame将列address中的JSON/dict扩展为JSON/Ddict内容的数据帧。然后,使用.join()与原始数据帧连接,如下所示:

可选步骤:如果JSON/dict实际上是字符串,请先将它们转换为正确的JSON/dict。否则,请跳过此步骤。

import ast
df['address'] = df['address'].map(ast.literal_eval)

主要代码:

import pandas as pd
df[['name', 'age']].join(pd.DataFrame(df['address'].tolist(), index=df.index).add_prefix('address.'))

结果:

name  age  address.number address.street address.city
1  Steve   27               4      Main Road       Oxford
2   Adam   32              78        High St    Cambridge

或者,如果您从JSON/dict中只有几列要添加,也可以使用字符串访问器str[]逐个添加,如下所示

df['address.number'] = df['address'].str['number']
df['address.street'] = df['address'].str['street']
df['address.city'] = df['address'].str['city']

设置

import pandas as pd
data = {'name': {1: 'Steve', 2: 'Adam'},
'age': {1: 27, 2: 32},
'address': {1: {"number": 4, "street": "Main Road", "city": "Oxford"},
2: {"number": 78, "street": "High St", "city": "Cambridge"}}}
df = pd.DataFrame(data)

根据用例的不同,设置聚合管道并$project必要的嵌套文档到顶层可能更有意义:

df = pd.DataFrame(db.collection_name.aggregate([{
'$project': {
'_id': 0,
'name': '$name',
'age': '$age',
# Raise Sub-documents to top-level under new name
'address_number': '$address.number',
'address_street': '$address.street',
'address_city': '$address.city'
}
}]))

df:

name  age  address_number address_street address_city
0  Steve   27               4      Main Road       Oxford
1   Adam   32              78        High St    Cambridge

或者,如果有太多的字段需要手动处理,我们也可以使用replaceRootmergeObjects:

df = pd.DataFrame(db.collection_name.aggregate([
{'$replaceRoot': {'newRoot': {'$mergeObjects': ["$$ROOT", "$address"]}}},
{'$project': {'_id': 0, 'address': 0}}
]))

df:

name  age  number     street       city
0  Steve   27       4  Main Road     Oxford
1   Adam   32      78    High St  Cambridge

collection_name设置:

# Drop Collection if exists
db.collection_name.drop()
# Insert Sample Documents
db.collection_name.insert_many([{
'name': 'Steve', 'age': 27,
'address': {"number": 4, "street": "Main Road", "city": "Oxford"}
}, {
'name': 'Adam', 'age': 32,
'address': {"number": 78, "street": "High St", "city": "Cambridge"}
}])

相关内容

  • 没有找到相关文章

最新更新