Pandas:访问多索引列Dataframe中不同顶级列索引下的多个列



我在找出我想要刮掉并输出到csv文件的表上标题的索引时遇到了麻烦,所以我需要在ResidualMaturityLast下分类的列,我只能获得表的主标题而不是子标题。我尝试过使用df[('Yield', 'Last'),但只能获得该特定列而不是两者。

import pandas as pd
import requests
url = 'http://www.worldgovernmentbonds.com/country/japan/'
r = requests.get(url)
df_list = pd.read_html(r.text, flavor='html5lib')
df = df_list[4]
yc = df[["ResidualMaturity", "Yield"]]
print(yc)

电流输出

ResidualMaturity    Yield                   
ResidualMaturity     Last    Chg 1M   Chg 6M
0           1 month  -0.114%   +9.0 bp  +7.4 bp
1          3 months  -0.109%    0.0 bp  -1.9 bp
2          6 months  -0.119%   -0.3 bp  -1.9 bp
3          9 months  -0.119%  +10.0 bp  +9.9 bp
4            1 year  -0.125%   -0.7 bp  +0.9 bp
5           2 years  -0.121%   +0.9 bp  +1.3 bp
6           3 years  -0.113%   +2.2 bp  +2.7 bp
7           4 years  -0.094%   +2.6 bp  +2.1 bp
8           5 years  -0.082%   +2.3 bp  +1.8 bp
9           6 years  -0.056%   +3.4 bp  +0.4 bp
10          7 years  -0.029%   +5.1 bp  -0.4 bp
11          8 years   0.007%   +5.6 bp  -0.7 bp
12          9 years   0.052%   +5.6 bp  -1.3 bp
13         10 years   0.087%   +4.7 bp  -1.2 bp
14         15 years   0.288%   +4.3 bp  -2.4 bp
15         20 years   0.460%   +3.7 bp  -1.5 bp
16         30 years   0.689%   +3.5 bp  +1.6 bp
17         40 years   0.757%   +3.5 bp  +7.3 bp

我想要得到的期望输出

ResidualMaturity     Last    
0           1 month  -0.114%   
1          3 months  -0.109%    
2          6 months  -0.119%   
3          9 months  -0.119%  
4            1 year  -0.125%   
5           2 years  -0.121%   
6           3 years  -0.113%   
7           4 years  -0.094%   
8           5 years  -0.082%   
9           6 years  -0.056%   
10          7 years  -0.029%   
11          8 years   0.007%   
12          9 years   0.052%   
13         10 years   0.087%  
14         15 years   0.288%   
15         20 years   0.460%   
16         30 years   0.689%   
17         40 years   0.757%   

我已经尝试使用df[('Yield', 'Last')],但只能获得该特定列,而不是两者。

同时使用pd.IndexSlice.loc

idx = pd.IndexSlice
yc.loc[:, idx[:, ['ResidualMaturity', 'Last']]]

或者,在axis=1上使用.loc,如下所示:

idx = pd.IndexSlice
yc.loc(axis=1)[idx[:, ['ResidualMaturity', 'Last']]]

pd.IndexSlice以这种方式允许我们指定1级列标签而不指定0级列标签。

结果:

ResidualMaturity    Yield
ResidualMaturity     Last
0           1 month  -0.110%
1          3 months  -0.109%
2          6 months  -0.119%
3          9 months  -0.115%
4            1 year  -0.125%
5           2 years  -0.120%
6           3 years  -0.113%
7           4 years  -0.094%
8           5 years  -0.084%
9           6 years  -0.057%
10          7 years  -0.031%
11          8 years   0.005%
12          9 years   0.050%
13         10 years   0.086%
14         15 years   0.287%
15         20 years   0.461%
16         30 years   0.689%
17         40 years   0.757%

如果不想显示0级列索引:

idx = pd.IndexSlice
yc.loc(axis=1)[idx[:, ['ResidualMaturity', 'Last']]].droplevel(0, axis=1)

结果:

ResidualMaturity     Last
0           1 month  -0.110%
1          3 months  -0.109%
2          6 months  -0.119%
3          9 months  -0.115%
4            1 year  -0.125%
5           2 years  -0.120%
6           3 years  -0.113%
7           4 years  -0.094%
8           5 years  -0.084%
9           6 years  -0.057%
10          7 years  -0.031%
11          8 years   0.005%
12          9 years   0.050%
13         10 years   0.086%
14         15 years   0.287%
15         20 years   0.461%
16         30 years   0.689%
17         40 years   0.757%

下面是我得到的输出:

import pandas as pd
import requests
url = 'http://www.worldgovernmentbonds.com/country/japan/'
r = requests.get(url)
df_list = pd.read_html(r.text, flavor='html5lib')
df = df_list[4]
yc = df[df.columns[1:3]].droplevel(0, axis=1)
print(yc)

输出:

ResidualMaturity     Last
0           1 month  -0.110%
1          3 months  -0.109%
2          6 months  -0.119%
3          9 months  -0.115%
4            1 year  -0.125%
5           2 years  -0.120%
6           3 years  -0.113%
7           4 years  -0.094%
8           5 years  -0.084%
9           6 years  -0.057%
10          7 years  -0.031%
11          8 years   0.005%
12          9 years   0.050%
13         10 years   0.086%
14         15 years   0.287%
15         20 years   0.461%
16         30 years   0.689%
17         40 years   0.757%

最新更新