根据列和另一个字典中的索引列表创建新的数据帧列



给定以下数据帧和字典列表:

import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict([
{'id': '912SAFD', 'key': 3, 'list_index': [0]},
{'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
{'id': '712SAFD', 'key': 5, 'list_index': [2]}])
designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]}, 
{'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]}, 
{'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]

数据帧输出:

id  key list_index
0  912SAFD    3        [0]
1  812SAFD    4     [0, 1]
2  712SAFD    5        [2]

在不使用显式循环的情况下(如果可能的话(,是否可以为每一行迭代'list_index'中的列表,提取值并使用它们按索引访问字典列表,然后根据字典中的值创建新列?

以下是预期结果的示例:

id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b    NaN    NaN         NaN
1  812SAFD    4     [0, 1]      b      c    NaN         NaN
2  712SAFD    5        [2]    NaN    NaN      d          fr

如果'lang'不为空,则应将其作为列添加到数据帧中,方法是使用color_id值与下划线及其自身名称组合作为列名。例如:609092_lang

任何帮助都将不胜感激。

# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]
df_designs = pd.DataFrame(designs)
df_designs['lang_code'] = 'lang_' + df_designs['color_id'].astype(str)
df_designs['lang'] = df_designs.lang.replace('', np.NaN)
df = df.explode('list_index').merge(df_designs, left_on='list_index', right_index=True)
df_color = df.pivot(index=['id', 'key'], columns=['color_id'], values='value')
df_lang = df.pivot(index=['id', 'key'], columns=['lang_code'], values='lang')
df = df_color.join(df_lang).reset_index().dropna(how='all' , axis=1)
print(df)

输出:

>>>
id  key 609090 609091 609092 lang_609092
0  712SAFD    5    NaN    NaN      d          fr
1  812SAFD    4      b      c    NaN         NaN
2  912SAFD    3      b    NaN    NaN         NaN

或者,如果您可以使用multiIndex df,而不是命名它们,那会更简单:
# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]
df_designs = pd.DataFrame(designs)
df_designs['lang'] = df_designs.lang.replace('',np.NaN)
df = df.explode('list_index').merge(df_designs, left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key'], columns=['color_id'], values=['value','lang']).dropna(how='all' , axis=1).reset_index()
print(df)

输出:

>>>
id key  value                 lang
color_id              609090 609091 609092 609092
0         712SAFD   5    NaN    NaN      d     fr
1         812SAFD   4      b      c    NaN    NaN
2         912SAFD   3      b    NaN    NaN    NaN

最新更新