我有一个pandas
DataFrame,在列上有一个MultiIndex,并希望访问bar
的所有列加上baz
下的一个额外列-我如何才能实现这一点?
输入:
import numpy as np
import pandas as pd
np.random.seed(123)
iterables = [["bar", "baz", "foo", "qux"], ["one", "two"]]
idx = pd.MultiIndex.from_product(iterables, names=["first", "second"])
df = pd.DataFrame(np.random.randn(4, 8), columns=idx)
first bar baz foo qux
second one two one two one two one two
0 -1.085631 0.997345 0.282978 -1.506295 -0.578600 1.651437 -2.426679 -0.428913
1 1.265936 -0.866740 -0.678886 -0.094709 1.491390 -0.638902 -0.443982 -0.434351
2 2.205930 2.186786 1.004054 0.386186 0.737369 1.490732 -0.935834 1.175829
3 -1.253881 -0.637752 0.907105 -1.428681 -0.140069 -0.861755 -0.255619 -2.798589
预期输出:
first bar baz
second one two two
0 -1.085631 0.997345 -1.506295
1 1.265936 -0.866740 -0.094709
2 2.205930 2.186786 0.386186
3 -1.253881 -0.637752 -1.428681
我可以指定特定的列,如df.loc[:, [("bar", "one"), ("bar", "two"), ("baz", "two")]]
,但我似乎不能为一个顶级指定任意数字,如df.loc[:, ["bar", ("baz", "two")]]
iloc
溶液
ix = df.columns.get_loc
df.iloc[:, np.r_[ix('bar'), ix(('baz', 'two'))]]
first bar baz
second one two two
0 -1.085631 0.997345 -1.506295
1 1.265936 -0.866740 -0.094709
2 2.205930 2.186786 0.386186
3 -1.253881 -0.637752 -1.428681
如果您不想键入所有内容,您可以将其分开,然后使用pd.concat
。
bar = df.loc[:, ["bar"]]
baz = df.loc[:, [('baz','two')]]
pd.concat([bar,baz], axis=1)
Out[451]:
first bar baz
second one two two
0 -1.085631 0.997345 -1.506295
1 1.265936 -0.866740 -0.094709
2 2.205930 2.186786 0.386186
3 -1.253881 -0.637752 -1.428681
另一种方法是选择所有Baz
和bar
,然后删除不需要的列:
df.loc(axis = 1)[['bar','baz']].drop(columns=('baz','one'))
Out[24]:
first bar baz
second one two two
0 -1.085631 0.997345 -1.506295
1 1.265936 -0.866740 -0.094709
2 2.205930 2.186786 0.386186
3 -1.253881 -0.637752 -1.428681
您还可以使用列表推导式预先选择相关列:
columns = [(left, right)
for left, right in df.columns
if left == 'bar'
or (left, right) == ('baz', 'two')]
df.loc[:, columns]
first bar baz
second one two two
0 -1.085631 0.997345 -1.506295
1 1.265936 -0.866740 -0.094709
2 2.205930 2.186786 0.386186
3 -1.253881 -0.637752 -1.428681
与pyjanitor的select_columns,可以传递一个可调用对象,在这种情况下,它模仿上面的列表推导:
# pip install pyjanitor
df.select_columns(lambda df: df.name[0] == 'bar'
or df.name == ('baz', 'two'))
first bar baz
second one two two
0 -1.085631 0.997345 -1.506295
1 1.265936 -0.866740 -0.094709
2 2.205930 2.186786 0.386186
3 -1.253881 -0.637752 -1.428681