熊猫提取嵌套数据



>我有以下数据结构

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EMP_ID     ¦ ATTR1_OLD_VAL ¦                                                                                        ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL ¦ ATTR3_NEW_VAL   ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+-------------------------------------------------------------------------------------------¦
¦      E001 ¦ xyz           ¦ [{"codeTs":"12345567 ","goodsAttrName":"test1"},{"codeTs":"6402910000","goodsAttrName":"test2"}]      ¦ mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
E002 ¦ 1234          ¦                                                                                                       ¦ 123           ¦ jjj           ¦ iii           ¦ bb             ¦
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

我想将其处理成以下数据

+---------------------------------------------------------------------------------------------------------------------------------------------+
EMP_ID     ¦ ATTR1_OLD_VAL ¦ codeTs     ¦  goodsAttrName  ¦ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL    ¦ ATTR3_NEW_VAL  ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+------------------------------------¦
¦      E001 ¦ xyz           ¦ 12345567   ¦  test1          ¦               ¦   mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
¦      E001 ¦ xyz           ¦ 6402910000 ¦  test2          ¦               ¦   mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
¦      E002 ¦ 1234          ¦            ¦                 ¦               ¦   123           ¦  jjj          ¦ iii           ¦ bb             ¦                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------+

我试过以下代码不起作用

import pymongo as pm
import pandas as pd
import numpy as np
import datetime as dt
from bson import json_util
from pandas.io.json import json_normalize
import json
client = pm.MongoClient('mongodb://user1:user1@127.0.0.1:27017')
db = client['my_db']
mongo_data = list(db['ATTR1_NEW_VAL'].find({}))
sanitized = json.loads(json_util.dumps(mongo_data))
normalized = json_normalize(sanitized)
df = pd.DataFrame(normalized)
print(df)

试试这个:

首先使用df.explode将字典列表分解为行,然后将字典值转换为列apply(pd.Series)

df = df.explode('ATTR1_NEW_VAL')
print(pd.concat([df.drop(['ATTR1_NEW_VAL'], axis=1), df['ATTR1_NEW_VAL'].apply(pd.Series)], axis=1))

最新更新