如何使用MultiIndex重新建立索引



我有一个这样的DataFrame:

import pandas as pd
df = pd.DataFrame.from_dict({'var1': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0},
'var2': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0},
'var3': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0},
'var4': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0}})

我想填补缺失的索引,所以我首先使用了.reindex

df.reindex(np.arange(1, 11))

我得到了:

var1    var2    var3    var4
1   0.0     0.0     0.0     0.0
2   0.0     0.0     0.0     0.0
3   0.0     0.0     0.0     0.0
4   0.0     0.0     0.0     0.0
5   NaN     NaN     NaN     NaN
6   0.0     0.0     0.0     0.0
7   0.0     0.0     0.0     0.0
8   0.0     0.0     0.0     0.0
9   NaN     NaN     NaN     NaN
10  0.0     0.0     0.0     0.0

然而,我需要跟踪多个索引,当我试图构建MultiIndex并将其传递给.reindex时,它并没有像我预期的那样工作:

df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]))
var1    var2    var3    var4
A   1   NaN     NaN     NaN     NaN
2   NaN     NaN     NaN     NaN
3   NaN     NaN     NaN     NaN
4   NaN     NaN     NaN     NaN
5   NaN     NaN     NaN     NaN
6   NaN     NaN     NaN     NaN
7   NaN     NaN     NaN     NaN
8   NaN     NaN     NaN     NaN
9   NaN     NaN     NaN     NaN
10   NaN     NaN     NaN     NaN

我真的不明白这里发生了什么,.reindex的文档对我来说也不太清楚。有人能给我建议吗?告诉我为什么MultiIndex不能传递给.reindex,或者我做错了什么?

@编辑:

@jazrael在我们从1级多索引移动到2级多索引时提供了一个很好的解决方案。然而,当我们想从2级多索引重新索引到3级多索引时,情况如何?

例如:

df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index])
var1    var2    var3    var4
1   0   0.0     0.0     0.0     0.0
1   0.0     0.0     0.0     0.0
2   0.0     0.0     0.0     0.0
3   0.0     0.0     0.0     0.0
2   4   0.0     0.0     0.0     0.0
6   0.0     0.0     0.0     0.0
7   0.0     0.0     0.0     0.0
8   0.0     0.0     0.0     0.0
10   0.0     0.0     0.0     0.0

我想得到:

var1    var2    var3    var4
A   1   0   0.0     0.0     0.0     0.0
1   0.0     0.0     0.0     0.0
2   0.0     0.0     0.0     0.0
3   0.0     0.0     0.0     0.0
2   4   0.0     0.0     0.0     0.0
5   NaN     NaN     NaN     NaN
6   0.0     0.0     0.0     0.0
7   0.0     0.0     0.0     0.0
8   0.0     0.0     0.0     0.0
9   NaN     NaN     NaN     NaN
10   0.0     0.0     0.0     0.0

因为要使用reindex进行简单,而不是MultiIndex索引,所以需要设置level=1来匹配新MultiIndex:的第二级

df = df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]), level=1)
print (df)
var1  var2  var3  var4
A 1    0.0   0.0   0.0   0.0
2    0.0   0.0   0.0   0.0
3    0.0   0.0   0.0   0.0
4    0.0   0.0   0.0   0.0
5    NaN   NaN   NaN   NaN
6    0.0   0.0   0.0   0.0
7    0.0   0.0   0.0   0.0
8    0.0   0.0   0.0   0.0
9    NaN   NaN   NaN   NaN
10   0.0   0.0   0.0   0.0

您可以创建一个具有额外级别的新索引,并执行显式DataFrame联接以获得所需内容。

df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index], names=["key1", "key2"])
# If df's index is already created, do df.rename_axis(["key1", "key2"], inplace=True)
new_index = pd.MultiIndex.from_arrays([['A']*11, np.repeat([1, 2], [4, 7]), range(11)],
names=["new_key", *df.index.names])
output = pd.DataFrame([], index=new_index).join(df, on=df.index.names)  # Join on overlapped index levels based on names

输出:

var1  var2  var3  var4
new_key key1 key2                        
A       1    0      0.0   0.0   0.0   0.0
1      0.0   0.0   0.0   0.0
2      0.0   0.0   0.0   0.0
3      0.0   0.0   0.0   0.0
2    4      0.0   0.0   0.0   0.0
5      NaN   NaN   NaN   NaN
6      0.0   0.0   0.0   0.0
7      0.0   0.0   0.0   0.0
8      0.0   0.0   0.0   0.0
9      NaN   NaN   NaN   NaN
10     0.0   0.0   0.0   0.0

最新更新