将字典元组拆分为数据帧中的单个记录

我有一个数据帧df。它有一列包含数据，如下例所示。每条记录在已标识为"匹配"的字典列中包含一个元组。我想从 df 匹配列创建一个新的数据帧，如下面的输出所示。我在这里将元组拆分为单独的记录，并将它们的每个键拆分为列，并添加一个值为"a"的"类型"字段以指示两条记录已匹配。我还想添加一个 TypeId 字段，以便每个元组都有一个 ID 号，以标识匹配的值来自同一原始记录。谁能提出一种方法来做到这一点？

法典：

df['match'][0]

数据：

{'__class__': 'tuple',
'__value__': [{'': '363336',
'unitofmeasure': 'each',
'product_id': '11',
'classification': 'top',
'Id': '363336'},
{'': '368654',
'unitofmeasure': 'each',
'product_id': '10',
'classification': 'bottom',
'Id': '368654'}]}

输出：

unitofmeasure  product_id  classification  Id      type  typeId
363336  each           11          top             363336  a     1
368654  each           10          bottom          368654  a     1

# read record in from match
emptLst=[]
for i in range(len(df['match'].dropna())):

df2=pd.DataFrame(df['match'][i]['__value__'])
# add match column with value 'a'
df2['label']='a'
# df2.head()
# add column id value based on row number from original dataframe
df2['labeling_set_id']=i
emptLst.append(df2)

for j in range(len(emptLst)):
if j==0:
dfm=emptLst[0]
else:
dfm=pd.concat([dfm,emptLst[j]])

# read record in from distinct
emptLst2=[]
for i in range(len(df['distinct'].dropna())):

df3=pd.DataFrame(df['distinct'][i]['__value__'])
# add label column with value 'b'
df3['label']='b'
# df3.head()
# add column id value based on row number from original dataframe
df3['labeling_set_id']=(i+len(df['distinct'].dropna()))
emptLst2.append(df3)

for j in range(len(emptLst2)):
if j==0:
dfd=emptLst2[0]
else:
dfd=pd.concat([dfd,emptLst2[j]])

df_label=pd.concat([dfm,dfd])
df_label['labeling_set_id']=df_label['labeling_set_id']+1
df_label.head()

相关内容

最新更新

热门标签：