如何使一个堆叠(整洁)的数据框架,其中索引(索引)填充在每一行?



我需要取一个宽的multiindex数据框架,并将其堆叠到Tidy中,以便另一个程序对其进行绘图。我在另一个问题中解决了这个问题。但是,最终导出的文件已经合并了用于堆叠索引的单元格。我需要在行中重复每个索引,以便其他程序不会将合并的部分读取为";null&;"。

我们有一组产品、年份、颜色和尺寸作为索引,Sales数字作为数据。在最后的堆栈之后,数据框看起来像这样:

# Minimum Working Example of incoming data in wide format
import pandas as pd
import numpy as np
colhead = ["Small Black", "Small White", "Small Brown", "Medium Black", "Medium White", "Medium Brown", "Large Black", "Large White", "Large Brown"]
rowhead = pd.MultiIndex.from_product([['sofa','table','chair'],[2011, 2012, 2013, 2014, 2015]])
df_mix = pd.DataFrame(np.random.randint(1,10, size=15,9)), index=rowhead, columns=colhead)
# Reindex by list and use .names to label dataframe hierarchy
hierarch1 = ["Small", "Small", "Small", "Medium", "Medium", "Medium", "Large", "Large", "Large"]
hierarch2 = ["Black", "White", "Brown", "Black", "White", "Brown", "Black", "White", "Brown"]
df_mixfix = df_mix
df_mixfix.columns = [hierarch1, hierarch2]
df_mixfix.columns.names = ['Size', 'Color']
df_mixfix.index.names = ['Product', 'Year']
# Stack for tidy data
stk = df_mixfix.stack()
df_stk = stk.stack()
print(df_stk)
Product  Year  Color  Size  
sofa     2011  Black  Large     1
Medium    5
Small     1
Brown  Large     9
Medium    4
..
chair    2015  Brown  Medium    6
Small     5
White  Large     8
Medium    1
Small     6
Length: 135, dtype: int32

注意,创建了一些额外的表,其中MultiIndex首先按年份重新分组,然后按产品重新分组。

无论分组如何,我需要它看起来像这样,其中每个分组的行都用索引填充,以便当我运行导出到excel(这是这里的问题:因为它看起来像上面,而不是下面),整理后的数据不会被合并:

Product  Year  Color  Size  
sofa     2011  Black  Large     1
sofa     2011  Black  Medium    5
sofa     2011  Black  Small     1
sofa     2011  Brown  Large     9
sofa     2011  Brown  Medium    4
..
chair    2015  Brown  Medium    6
chair    2015  Brown  Small     5
chair    2015  White  Large     8
chair    2015  White  Medium    1
chair    2015  White  Small     6

What I have try能够使Jupyter notebook中的显示看起来不稀疏,但导出到Excel最终仍然合并。我必须将pandas更新到1.5.2,然后使用以下命令:

pd.set_option("display.multi_sparse", False) # for output display
pd.set_option("styler.sparse.index", False) # fills in rows
with pd.ExcelWriter('Tidy.xlsx') as writer:
df_stk.to_excel(writer, sheet_name='Sales') #still has merged cells

我如何在Excel中的每一行的层次键中获得每个显式级别元素?

我明白了!虽然示例没有帮助,但文档中隐藏的信息是我只需要添加参数merge_cells=False

所以写入器内的全部内容是:

with pd.ExcelWriter('Tidy.xlsx') as writer:
df_stk.to_excel(writer, sheet_name='Sales', merge_cells=False) # Fully Tidy data!

注意:这将取消所有分层合并。

最新更新