是否有办法从熊猫数据框中删除自动截断?



我试图创建一个多索引数据框架,其中包含所有可能的索引,甚至是目前不包含值的索引。我希望将这些不存在的值设置为0。为了实现这一点,我使用了以下命令:

index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']
grouped_df = df.groupby(by = index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(pd.MultiIndex.from_product(grouped_df.index.levels), fill_value = 0)

预期结果:

___________________________________________________________________________________________ 
|Chan. | Duration   | Designation|    Manufact. |Total Purchases|  Sales      |   Cost      |
|______|____________|____________|______________|_______________|_____________|_____________|
|      | Month      | Special    |    Brand     |     0         |    0.00     |   0.00      |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |    0.00     |   0.00      |
|Retail|            |____________|______________|_______________|_____________|_____________|
|      |            |Not Special |    Brand     |     756       | 15654.07    |   9498.23   |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     7896      |  98745.23   |    78953.56 |
|      |____________|____________|______________|_______________|_____________|_____________|
|      | Season     | Special    |    Brand     |     0         |  0.00       |    0.00     |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |  0.00       |    0.00     |
|      |            |____________|______________|_______________|_____________|_____________|
|      |            |Not Special |    Brand     |     0         |  0.00       |    0.00     |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |     0         |  0.00       |    0.00     |
|______|____________|____________|______________|_______________|_____________|_____________|

当至少一个索引级别包含值时产生此结果。但是,如果索引级别不包含任何值,则生成如下结果:

___________________________________________________________________________________________ 
|Chan. | Duration   | Designation|    Manufact. |Total Purchases|  Sales      |   Cost      |
|______|____________|____________|______________|_______________|_____________|_____________|
|      | Month      | Not Special|    Brand     |     756       |  15654.07   |   9498.23   |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |    7896       | 98745.23    |   78953.56  |
|Retail|____________|____________|______________|_______________|_____________|_____________|
|      | Season     |Not Special |    Brand     |       0       |    0.00     |     0.00    |
|      |            |            |______________|_______________|_____________|_____________|
|      |            |            |    Generic   |       0       |    0.00     |     0.00    |
|______|____________|____________|______________|_______________|_____________|_____________|

由于某些原因,这些值继续被自动截断。我如何修复索引,以便总是产生期望的结果,并且我总是可以可靠地使用这些索引进行计算,即使所述索引中没有值?

您可以做的是事先构造所需的固定索引。例如,基于一个字典,其中键是用作组索引的列标签,值是所有可能的结果。

index_levels = {
'Channel': ['Retails'], 
'Duration': ['Month', 'Season'], 
'Designation': ['Special', 'Not Special'], 
'Manufacturing Class': ['Brand', 'Generic']
}
fixed_index = pd.MultiIndex.from_product(index_levels.values(), names=index_levels.keys())

然后你可以做

grouped_df = df.groupby(by=index_levels.keys())[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(fixed_index, fill_value=0)

相关内容

  • 没有找到相关文章

最新更新