熊猫 - json 在数据帧内规范化



我想将数据帧中的一列分解为多列。

我有一个具有以下配置的数据帧:


GroupId,SubGroups,Type,Name
-4781505553015217258,"{'GroupId': -732592932641342965, 'SubGroups': [], 'Type': 'DefaultSite', 'Name': 'Default Site'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 8123255835936628631, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'MERCEDES BENZ'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -1785570219922840611, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'VOLVO'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -3670461095557699088, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'SCANIA'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 8683757391859854416, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'DRIVERS'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -8066654520755643389, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X - DECOMMISSION'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 4177323092254043025, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X-INSTALLATION'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -6088426161802844604, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'FORD'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 8512440039365422841, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'HEAVY VEHICLES'}",OrganisationGroup,CompanyXYZ

我想创建一个新的数据帧,其中SubGroups列被分解为它的组件。请注意,列内的名称SubGroups前缀为SubGroups_

GroupId, SubGroup_GroupId, SubGroup_SubGroups, SubGroup_Type, SubGroup_Name, Type, Name
-4781505553015217258, -732592932641342965, [], 'DefaultSite', 'Default Site', OrganisationGroup, CompanyXYZ
-4781505553015217258, 8123255835936628631, [], 'SiteGroup', 'MERCEDES BENZ', OrganisationGroup, CompanyXYZ

我尝试了以下代码:

for row in AllSubGroupsDF.itertuples():
newDF= newDF.append((pd.io.json.json_normalize(row.SubGroups)))

但它返回

GroupId,SubGroups,Type,Name
-732592932641342965,[],DefaultSite,Default Site
8123255835936628631,[],SiteGroup,MERCEDES BENZ
-1785570219922840611,[],SiteGroup,VOLVO
-3670461095557699088,[],SiteGroup,SCANIA
8683757391859854416,[],SiteGroup,DRIVERS
-8066654520755643389,[],SiteGroup,X - DECOMMISSION
4177323092254043025,[],SiteGroup,X-INSTALLATION
-6088426161802844604,[],SiteGroup,FORD
8512440039365422841,[],SiteGroup,HEAVY VEHICLES

我想让它全部集中在一个数据帧中,但我不确定如何。请帮忙?

您可以尝试使用ast包:-

import pandas as pd
import ast
data = [[-4781505553015217258,"{'GroupId': -732592932641342965, 'SubGroups': [], 'Type': 'DefaultSite', 'Name': 'Default Site'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 8123255835936628631, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'MERCEDES BENZ'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -1785570219922840611, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'VOLVO'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -3670461095557699088, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'SCANIA'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 8683757391859854416, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'DRIVERS'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -8066654520755643389, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X - DECOMMISSION'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 4177323092254043025, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X-INSTALLATION'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -6088426161802844604, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'FORD'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 8512440039365422841, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'HEAVY VEHICLES'}","OrganisationGroup","CompanyXYZ"]]
df = pd.DataFrame(data,columns=["GroupId","SubGroups","Type","Name"])
df["SubGroup_GroupId"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["GroupId"])
df["SubGroup_SubGroups"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["SubGroups"])
df["SubGroup_Type"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["Type"])
df["SubGroup_Name"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["Name"])
df

希望这有帮助!!

最新更新