包含多个嵌套列表的Dict的扁平化列表使用pandas json_normalize



我们有以下列表-

users_info = [
{
"Id": "21",
"Name": "ABC",
"Country": {
"Name": "Country 1"
},
"Addresses": {
"records": [
{
"addressId": "12",
"line1": "xyz, 102",
"city": "PQR"
},
{
"addressId": "13",
"line1": "YTR, 102",
"city": "NMS"
}
]
},
"Education": {
"records": [
{
"Id": "45",
"Degree": "Bachelors"
},
{
"Id": "49",
"Degree": "Masters"
}
]
}

},
{
"Id": "26",
"Name": "PEW",
"Country": {
"Name": "Country 2"
},
"Addresses": {
"records": [
{
"addressId": "10",
"line1": "BTR, 12",
"city": "UYT"
},
{
"addressId": "123",
"line1": "MEQW, 6",
"city": "KJH"
}
]
},
"Education": {
"records": [
{
"Id": "45",
"Degree": "Bachelors"
},
{
"Id": "49",
"Degree": "Masters"
}
]
}

},
{
"Id": "214",
"Name": "TUF",
"Country": None,
"Addresses": {
"records": None
},
"Education": {
"records": [
{
"Id": "45",
"Degree": "Bachelors"
},
{
"Id": "49",
"Degree": "Masters"
}
]
}

},
{
"Id": "2609",
"Name": "JJU",
"Country": {
"Name": "Country 2"
},
"Addresses": {
"records": [
{
"addressId": "10",
"line1": "BTR, 12",
"city": "UYT"
},
{
"addressId": "123",
"line1": "MEQW, 6",
"city": "KJH"
}
]
},
"Education": None       
}       
]

我想把这个字典列表放平,并创建pandas数据帧-

Id | Name | Country.Name | Addresses.addressId | Addresses.line1 | Addresses.city | Education.Id | Education.Degree

有两个字典列表——地址和教育。这些(国家、地址和教育(中的任何一个都有可能是

如何使用以上数据创建数据帧以获得所需的格式?

以下是我尝试过的

dataset_1 = pandas.json_normalize([user for user in users_info],
record_path = ["Addresses"],
meta = ["Id", "Name", ["Country", "Name"]],
errors="ignore"
)
dataset_2 = pandas.json_normalize([user for user in users_info],
record_path = ["Education"],
meta = ["Id", "Name", ["Country", "Name"]],
errors="ignore"
)
dataset = pandas.concat([dataset_1, dataset_2], axis=1)

当dataset_1执行时,我得到-

NoneType object is not iterable error

您可以使用:

df = pd.json_normalize(users_info)
df_addresses = df['Addresses.records'].explode().apply(pd.Series)
df_addresses.rename(columns={col:f'Addresses.{col}' for col in df_addresses.columns}, inplace=True)
df_education = df['Education.records'].explode().apply(pd.Series)
df_education.rename(columns={col:f'Education.{col}' for col in df_education.columns}, inplace=True)
cols = [col for col in df.columns if col not in ['Addresses.records', 'Education.records']]
df = df[cols].join(df_addresses).join(df_education)
df.dropna(axis=1, how='all', inplace=True)
print(df)

OUTPUT

Id Name Country.Name Addresses.addressId Addresses.line1 Addresses.city Education.Degree Education.Id
0    21  ABC    Country 1                  12        xyz, 102            PQR        Bachelors           45
0    21  ABC    Country 1                  12        xyz, 102            PQR          Masters           49
0    21  ABC    Country 1                  13        YTR, 102            NMS        Bachelors           45
0    21  ABC    Country 1                  13        YTR, 102            NMS          Masters           49
1    26  PEW    Country 2                  10         BTR, 12            UYT        Bachelors           45
1    26  PEW    Country 2                  10         BTR, 12            UYT          Masters           49
1    26  PEW    Country 2                 123         MEQW, 6            KJH        Bachelors           45
1    26  PEW    Country 2                 123         MEQW, 6            KJH          Masters           49
2   214  TUF          NaN                 NaN             NaN            NaN        Bachelors           45
2   214  TUF          NaN                 NaN             NaN            NaN          Masters           49
3  2609  JJU    Country 2                  10         BTR, 12            UYT              NaN          NaN
3  2609  JJU    Country 2                 123         MEQW, 6            KJH              NaN          NaN

最新更新