".loc"和".iloc"与MultiIndex'd DataFrame

在索引多索引数据框时，.iloc似乎假设您是在.loc查看外部级别时引用索引的"内部级别"。

例如：

np.random.seed(123)
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
idx = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=idx)
# .loc looks at the outer index:
print(df.loc['qux'])
# df.loc['two'] would throw KeyError
              0        1        2        3
second                                    
one    -1.25388 -0.63775  0.90711 -1.42868
two    -0.14007 -0.86175 -0.25562 -2.79859
# while .iloc looks at the inner index:
print(df.iloc[-1])
0   -0.14007
1   -0.86175
2   -0.25562
3   -2.79859
Name: (qux, two), dtype: float64

两个问题：

首先，为什么？这是故意的设计决定吗？

其次，我可以使用.iloc参考索引的外部级别，以产生以下结果？我知道我可以首先找到具有get_level_values的索引的最后一个成员，然后将其与.loc索引一起找到，但是如果可以直接地完成使用时尚的.iloc语法或某些专门为情况设计的现有功能。<<<<<<<<<<<<<<<<<<<<

# df.iloc[-1]
qux   one     0.89071  1.75489  1.49564  1.06939
      two    -0.77271  0.79486  0.31427 -1.32627

是的，这是一个故意的设计决定：

.iloc是一个严格的位置索引器，不注意结构根本只有第一个实际行为。... .loc do 说明级别的行为。[添加了强调]

因此，使用.iloc以灵活的方式不可能在问题中给出的所需结果。在几个类似问题中使用的最接近的解决方法是

print(df.loc[[df.index.get_level_values(0)[-1]]])
                    0        1        2        3
first second                                    
qux   one    -1.25388 -0.63775  0.90711 -1.42868
      two    -0.14007 -0.86175 -0.25562 -2.79859

使用双括号将保留第一个索引级别。

您可以使用：

df.iloc[[6, 7], :]
Out[1]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589

[6, 7]对应于这些行的实际行索引，如下所示：

df.reset_index()
Out[]:
  first second         0         1         2         3
0   bar    one -1.085631  0.997345  0.282978 -1.506295
1   bar    two -0.578600  1.651437 -2.426679 -0.428913
2   baz    one  1.265936 -0.866740 -0.678886 -0.094709
3   baz    two  1.491390 -0.638902 -0.443982 -0.434351
4   foo    one  2.205930  2.186786  1.004054  0.386186
5   foo    two  0.737369  1.490732 -0.935834  1.175829
6   qux    one -1.253881 -0.637752  0.907105 -1.428681
7   qux    two -0.140069 -0.861755 -0.255619 -2.798589

这也与df.iloc[[-2, -1], :]或df.iloc[range(-2, 0), :]。

一起使用

编辑：将其变成更通用的解决方案

然后可以获得通用功能：

def multindex_iloc(df, index):
    label = df.index.levels[0][index]
    return df.iloc[df.index.get_loc(label)]
multiindex_loc(df, -1)
Out[]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589

multiindex_loc(df, 2)
Out[]:
                     0         1         2         3
first second
foo   one     2.205930  2.186786  1.004054  0.386186
      two     0.737369  1.490732 -0.935834  1.175829

您可以在使用loc之前使用swaplevel方法重新排序索引。

df.swaplevel(0,-1).loc['two']

使用您的问题的示例数据，看起来像这样：

>>> df
                     0         1         2         3
first second                                        
bar   one    -1.085631  0.997345  0.282978 -1.506295
      two    -0.578600  1.651437 -2.426679 -0.428913
baz   one     1.265936 -0.866740 -0.678886 -0.094709
      two     1.491390 -0.638902 -0.443982 -0.434351
foo   one     2.205930  2.186786  1.004054  0.386186
      two     0.737369  1.490732 -0.935834  1.175829
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589
>>> df.loc['bar']
               0         1         2         3
second                                        
one    -1.085631  0.997345  0.282978 -1.506295
two    -0.578600  1.651437 -2.426679 -0.428913
>>> df.swaplevel().loc['two']
              0         1         2         3
first                                        
bar   -0.578600  1.651437 -2.426679 -0.428913
baz    1.491390 -0.638902 -0.443982 -0.434351
foo    0.737369  1.490732 -0.935834  1.175829
qux   -0.140069 -0.861755 -0.255619 -2.798589

swaplevel是一种多索引方法，但是您可以在数据框架上直接调用它。默认值是交换内部两个级别，因此，如果您在多索引中有两个以上的级别，则应明确说明要交换的级别。

df.swaplevel(0,-1).loc['two']

相关内容

最新更新

热门标签：