我试图创建一个多索引数据框架,其中包含所有可能的索引,甚至是目前不包含值的索引。我希望将这些不存在的值设置为0。为了实现这一点,我使用了以下命令:
index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']
grouped_df = df.groupby(by = index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(pd.MultiIndex.from_product(grouped_df.index.levels), fill_value = 0)
预期结果:
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|Retail| |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
| |____________|____________|______________|_______________|_____________|_____________|
| | Season | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
| | |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
当至少一个索引级别包含值时产生此结果。但是,如果索引级别不包含任何值,则生成如下结果:
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Not Special| Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
|Retail|____________|____________|______________|_______________|_____________|_____________|
| | Season |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
由于某些原因,这些值继续被自动截断。我如何修复索引,以便总是产生期望的结果,并且我总是可以可靠地使用这些索引进行计算,即使所述索引中没有值?
您可以做的是事先构造所需的固定索引。例如,基于一个字典,其中键是用作组索引的列标签,值是所有可能的结果。
index_levels = {
'Channel': ['Retails'],
'Duration': ['Month', 'Season'],
'Designation': ['Special', 'Not Special'],
'Manufacturing Class': ['Brand', 'Generic']
}
fixed_index = pd.MultiIndex.from_product(index_levels.values(), names=index_levels.keys())
然后你可以做
grouped_df = df.groupby(by=index_levels.keys())[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(fixed_index, fill_value=0)