Pandas MultiIndex获取所有一个0级列和一个特定列



我有一个pandasDataFrame,在列上有一个MultiIndex,并希望访问bar的所有列加上baz下的一个额外列-我如何才能实现这一点?

输入:

import numpy as np
import pandas as pd
np.random.seed(123)
iterables = [["bar", "baz", "foo", "qux"], ["one", "two"]]
idx = pd.MultiIndex.from_product(iterables, names=["first", "second"])
df = pd.DataFrame(np.random.randn(4, 8), columns=idx)
first        bar                 baz                 foo                 qux
second       one       two       one       two       one       two       one       two
0      -1.085631  0.997345  0.282978 -1.506295 -0.578600  1.651437 -2.426679 -0.428913
1       1.265936 -0.866740 -0.678886 -0.094709  1.491390 -0.638902 -0.443982 -0.434351
2       2.205930  2.186786  1.004054  0.386186  0.737369  1.490732 -0.935834  1.175829
3      -1.253881 -0.637752  0.907105 -1.428681 -0.140069 -0.861755 -0.255619 -2.798589

预期输出:

first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

我可以指定特定的列,如df.loc[:, [("bar", "one"), ("bar", "two"), ("baz", "two")]],但我似乎不能为一个顶级指定任意数字,如df.loc[:, ["bar", ("baz", "two")]]

iloc溶液

ix = df.columns.get_loc
df.iloc[:, np.r_[ix('bar'), ix(('baz', 'two'))]]

first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

如果您不想键入所有内容,您可以将其分开,然后使用pd.concat

bar = df.loc[:, ["bar"]]
baz = df.loc[:, [('baz','two')]]
pd.concat([bar,baz], axis=1)
Out[451]: 
first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

另一种方法是选择所有Bazbar,然后删除不需要的列:

df.loc(axis = 1)[['bar','baz']].drop(columns=('baz','one'))
Out[24]:
first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

您还可以使用列表推导式预先选择相关列:

columns = [(left, right) 
for left, right in df.columns 
if left == 'bar' 
or (left, right) == ('baz', 'two')]
df.loc[:, columns]
first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

与pyjanitor的select_columns,可以传递一个可调用对象,在这种情况下,它模仿上面的列表推导:

# pip install pyjanitor
df.select_columns(lambda df: df.name[0] == 'bar' 
or df.name == ('baz', 'two'))
first        bar                 baz
second       one       two       two
0      -1.085631  0.997345 -1.506295
1       1.265936 -0.866740 -0.094709
2       2.205930  2.186786  0.386186
3      -1.253881 -0.637752 -1.428681

最新更新