Pandas在维护多索引的同时将字典分解为行



现在已经检查了大量的Stack Overflow线程,我正在努力将答案应用于我的特定用例,希望有人能帮助我解决特定问题。

我试图在维护多索引的同时,将字典中的数据分解为两列。

这是我目前拥有的:

| short_url | platform | css_problem_files                                                    |
|-----------|----------|----------------------------------------------------------------------|
| /url_1/   | desktop  | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
|           | mobile   | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| /url_2/   | desktop  | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
|           | mobile   | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |

以下是我想要实现的目标:

| short_url | platform | css_file   | css_value |
|-----------|----------|------------|-----------|
| /url_1/   | desktop  | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |
|           | mobile   | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |
| /url_2/   | desktop  | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |
|           | mobile   | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |

我唯一想到的与我所需要的非常接近的是以下内容,然而,这将创建超过20万行,而我预计它只有数千行(我还没有包括平台(:

m = pd.DataFrame([*df['css_problem_files']], df.index).stack()
.rename_axis([None,'css_files']).reset_index(1, name='pct usage')
out = df[['short_url']].join(m)

将非常感谢任何帮助或正确方向上的一点

如果你把字典变成键值对的列表,你可以分解它们,然后用.apply(pd.Series)把结果转换成两个新的列(并根据你的喜好重命名它们(,如下所示:

df = (df
.css_problem_files.apply(dict.items) # turn into key value list
.explode() # explode
.apply(pd.Series) # turn into columns
.rename(columns={0: "css_file", 1: "css_value"}) # rename
)

从Markdown表解析数据帧很烦人。如果它是在DataFrame构造函数中,它会容易得多。

df = pd.read_csv(io.StringIO('''short_url | platform | css_problem_files                                                   
/url_1/   | desktop  | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_1/   | mobile   | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/   | desktop  | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/   | mobile   | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}'''), sep=r's+|s+', index_col=[0, 1])
df['css_problem_files'] = df['css_problem_files'].apply(eval)

pd.DataFrame.apply(pd.Series)是实现这一点的最直接的方法,尽管它确实效率低下(请参阅答案以了解其说明(。这直接使用DataFrame构造函数,从而避免了性能问题。它通过将索引复制到新的数据帧来保留索引,并显式地提供列:

temp_df = df['css_problem_files'].apply(dict.items).explode()
pd.DataFrame(temp_df.to_list(), index=temp_df.index, columns=['css_file', 'css_value'])
css_file  css_value
short_url platform                       
/url_1/   desktop   css_file_1  css_value
desktop   css_file_2  css_value
desktop    css_fle_3  css_value
mobile    css_file_1  css_value
mobile    css_file_2  css_value
mobile     css_fle_3  css_value
/url_2/   desktop   css_file_1  css_value
desktop   css_file_2  css_value
desktop    css_fle_3  css_value
mobile    css_file_1  css_value
mobile    css_file_2  css_value
mobile     css_fle_3  css_value

最新更新