现在已经检查了大量的Stack Overflow线程,我正在努力将答案应用于我的特定用例,希望有人能帮助我解决特定问题。
我试图在维护多索引的同时,将字典中的数据分解为两列。
这是我目前拥有的:
| short_url | platform | css_problem_files |
|-----------|----------|----------------------------------------------------------------------|
| /url_1/ | desktop | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| | mobile | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| /url_2/ | desktop | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| | mobile | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
以下是我想要实现的目标:
| short_url | platform | css_file | css_value |
|-----------|----------|------------|-----------|
| /url_1/ | desktop | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
| | mobile | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
| /url_2/ | desktop | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
| | mobile | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
我唯一想到的与我所需要的非常接近的是以下内容,然而,这将创建超过20万行,而我预计它只有数千行(我还没有包括平台(:
m = pd.DataFrame([*df['css_problem_files']], df.index).stack()
.rename_axis([None,'css_files']).reset_index(1, name='pct usage')
out = df[['short_url']].join(m)
将非常感谢任何帮助或正确方向上的一点
如果你把字典变成键值对的列表,你可以分解它们,然后用.apply(pd.Series)
把结果转换成两个新的列(并根据你的喜好重命名它们(,如下所示:
df = (df
.css_problem_files.apply(dict.items) # turn into key value list
.explode() # explode
.apply(pd.Series) # turn into columns
.rename(columns={0: "css_file", 1: "css_value"}) # rename
)
从Markdown表解析数据帧很烦人。如果它是在DataFrame构造函数中,它会容易得多。
df = pd.read_csv(io.StringIO('''short_url | platform | css_problem_files
/url_1/ | desktop | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_1/ | mobile | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/ | desktop | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/ | mobile | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}'''), sep=r's+|s+', index_col=[0, 1])
df['css_problem_files'] = df['css_problem_files'].apply(eval)
pd.DataFrame.apply(pd.Series)
是实现这一点的最直接的方法,尽管它确实效率低下(请参阅答案以了解其说明(。这直接使用DataFrame构造函数,从而避免了性能问题。它通过将索引复制到新的数据帧来保留索引,并显式地提供列:
temp_df = df['css_problem_files'].apply(dict.items).explode()
pd.DataFrame(temp_df.to_list(), index=temp_df.index, columns=['css_file', 'css_value'])
css_file css_value short_url platform /url_1/ desktop css_file_1 css_value desktop css_file_2 css_value desktop css_fle_3 css_value mobile css_file_1 css_value mobile css_file_2 css_value mobile css_fle_3 css_value /url_2/ desktop css_file_1 css_value desktop css_file_2 css_value desktop css_fle_3 css_value mobile css_file_1 css_value mobile css_file_2 css_value mobile css_fle_3 css_value