Pandas-将嵌套的json分解为多行

我的Dataframe在下面的结构中。我想根据详细信息列中的嵌套值来分解它们

cust_id, name, details
101, Kevin, [{"id":1001,"country":"US","state":"OH"}, {"id":1002,"country":"US","state":"GA"}]
102, Scott, [{"id":2001,"country":"US","state":"OH"}, {"id":2002,"country":"US","state":"GA"}]

预期输出

cust_id, name, id, country, state
101, Kevin, 1001, US, OH
101, Kevin, 1002, US, GA
102, Scott, 2001, US, OH
102, Scott, 2002, US, GA

df = df.explode('details').reset_index(drop=True)
df = df.merge(pd.json_normalize(df['details']), left_index=True, right_index=True).drop('details', axis=1)

df.explode("details")基本上将details中的每一行复制N次，其中N是该行的details的数组中的项目数(如果有的话(
由于explode复制了行，所以原始行的索引(0和1(被复制到新行，因此它们的索引是0、0、1、1，这会打乱以后的处理。reset_index()为索引创建一个新列，从0开始。使用drop=True是因为默认情况下panda将保留旧的索引列；这将删除它
pd.json_normalize(df['details'])将列(其中每行包含一个JSON对象(转换为一个新的数据帧，其中所有JSON对象的每个唯一键都是新列
df.merge()将新数据帧合并到原始数据帧中
left_index=True和right_index=True告诉熊猫将指定的数据帧从其第一行开始合并到该数据帧中，从其的第一行开始
.drop('details', axis=1)去掉包含旧对象的旧details列

相关内容

最新更新

热门标签：