或者,如果您可以使用multiIndex df,而不是命名它们,那会更简单:
给定以下数据帧和字典列表:
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict([
{'id': '912SAFD', 'key': 3, 'list_index': [0]},
{'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
{'id': '712SAFD', 'key': 5, 'list_index': [2]}])
designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]},
{'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]},
{'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]
数据帧输出:
id key list_index
0 912SAFD 3 [0]
1 812SAFD 4 [0, 1]
2 712SAFD 5 [2]
在不使用显式循环的情况下(如果可能的话(,是否可以为每一行迭代'list_index'
中的列表,提取值并使用它们按索引访问字典列表,然后根据字典中的值创建新列?
以下是预期结果的示例:
id key list_index 609090 609091 609092 609092_lang
0 912SAFD 3 [0] b NaN NaN NaN
1 812SAFD 4 [0, 1] b c NaN NaN
2 712SAFD 5 [2] NaN NaN d fr
如果'lang'
不为空,则应将其作为列添加到数据帧中,方法是使用color_id
值与下划线及其自身名称组合作为列名。例如:609092_lang
。
任何帮助都将不胜感激。
# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]
df_designs = pd.DataFrame(designs)
df_designs['lang_code'] = 'lang_' + df_designs['color_id'].astype(str)
df_designs['lang'] = df_designs.lang.replace('', np.NaN)
df = df.explode('list_index').merge(df_designs, left_on='list_index', right_index=True)
df_color = df.pivot(index=['id', 'key'], columns=['color_id'], values='value')
df_lang = df.pivot(index=['id', 'key'], columns=['lang_code'], values='lang')
df = df_color.join(df_lang).reset_index().dropna(how='all' , axis=1)
print(df)
输出:
>>>
id key 609090 609091 609092 lang_609092
0 712SAFD 5 NaN NaN d fr
1 812SAFD 4 b c NaN NaN
2 912SAFD 3 b NaN NaN NaN
或者,如果您可以使用multiIndex df,而不是命名它们,那会更简单:
# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]
df_designs = pd.DataFrame(designs)
df_designs['lang'] = df_designs.lang.replace('',np.NaN)
df = df.explode('list_index').merge(df_designs, left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key'], columns=['color_id'], values=['value','lang']).dropna(how='all' , axis=1).reset_index()
print(df)
输出:
>>>
id key value lang
color_id 609090 609091 609092 609092
0 712SAFD 5 NaN NaN d fr
1 812SAFD 4 b c NaN NaN
2 912SAFD 3 b NaN NaN NaN