我在找出我想要刮掉并输出到csv文件的表上标题的索引时遇到了麻烦,所以我需要在ResidualMaturity
和Last
下分类的列,我只能获得表的主标题而不是子标题。我尝试过使用df[('Yield', 'Last')
,但只能获得该特定列而不是两者。
import pandas as pd
import requests
url = 'http://www.worldgovernmentbonds.com/country/japan/'
r = requests.get(url)
df_list = pd.read_html(r.text, flavor='html5lib')
df = df_list[4]
yc = df[["ResidualMaturity", "Yield"]]
print(yc)
电流输出
ResidualMaturity Yield
ResidualMaturity Last Chg 1M Chg 6M
0 1 month -0.114% +9.0 bp +7.4 bp
1 3 months -0.109% 0.0 bp -1.9 bp
2 6 months -0.119% -0.3 bp -1.9 bp
3 9 months -0.119% +10.0 bp +9.9 bp
4 1 year -0.125% -0.7 bp +0.9 bp
5 2 years -0.121% +0.9 bp +1.3 bp
6 3 years -0.113% +2.2 bp +2.7 bp
7 4 years -0.094% +2.6 bp +2.1 bp
8 5 years -0.082% +2.3 bp +1.8 bp
9 6 years -0.056% +3.4 bp +0.4 bp
10 7 years -0.029% +5.1 bp -0.4 bp
11 8 years 0.007% +5.6 bp -0.7 bp
12 9 years 0.052% +5.6 bp -1.3 bp
13 10 years 0.087% +4.7 bp -1.2 bp
14 15 years 0.288% +4.3 bp -2.4 bp
15 20 years 0.460% +3.7 bp -1.5 bp
16 30 years 0.689% +3.5 bp +1.6 bp
17 40 years 0.757% +3.5 bp +7.3 bp
我想要得到的期望输出
ResidualMaturity Last
0 1 month -0.114%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.119%
4 1 year -0.125%
5 2 years -0.121%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.082%
9 6 years -0.056%
10 7 years -0.029%
11 8 years 0.007%
12 9 years 0.052%
13 10 years 0.087%
14 15 years 0.288%
15 20 years 0.460%
16 30 years 0.689%
17 40 years 0.757%
我已经尝试使用df[('Yield', 'Last')]
,但只能获得该特定列,而不是两者。
同时使用pd.IndexSlice
和.loc
idx = pd.IndexSlice
yc.loc[:, idx[:, ['ResidualMaturity', 'Last']]]
或者,在axis=1
上使用.loc
,如下所示:
idx = pd.IndexSlice
yc.loc(axis=1)[idx[:, ['ResidualMaturity', 'Last']]]
pd.IndexSlice
以这种方式允许我们指定1级列标签而不指定0级列标签。
结果:
ResidualMaturity Yield
ResidualMaturity Last
0 1 month -0.110%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.115%
4 1 year -0.125%
5 2 years -0.120%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.084%
9 6 years -0.057%
10 7 years -0.031%
11 8 years 0.005%
12 9 years 0.050%
13 10 years 0.086%
14 15 years 0.287%
15 20 years 0.461%
16 30 years 0.689%
17 40 years 0.757%
如果不想显示0级列索引:
idx = pd.IndexSlice
yc.loc(axis=1)[idx[:, ['ResidualMaturity', 'Last']]].droplevel(0, axis=1)
结果:
ResidualMaturity Last
0 1 month -0.110%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.115%
4 1 year -0.125%
5 2 years -0.120%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.084%
9 6 years -0.057%
10 7 years -0.031%
11 8 years 0.005%
12 9 years 0.050%
13 10 years 0.086%
14 15 years 0.287%
15 20 years 0.461%
16 30 years 0.689%
17 40 years 0.757%
下面是我得到的输出:
import pandas as pd
import requests
url = 'http://www.worldgovernmentbonds.com/country/japan/'
r = requests.get(url)
df_list = pd.read_html(r.text, flavor='html5lib')
df = df_list[4]
yc = df[df.columns[1:3]].droplevel(0, axis=1)
print(yc)
输出:
ResidualMaturity Last
0 1 month -0.110%
1 3 months -0.109%
2 6 months -0.119%
3 9 months -0.115%
4 1 year -0.125%
5 2 years -0.120%
6 3 years -0.113%
7 4 years -0.094%
8 5 years -0.084%
9 6 years -0.057%
10 7 years -0.031%
11 8 years 0.005%
12 9 years 0.050%
13 10 years 0.086%
14 15 years 0.287%
15 20 years 0.461%
16 30 years 0.689%
17 40 years 0.757%