试图在网页上抓取债券信息。虽然使用Selenium可以获得包含所需数据的表的前几行的数据,但一些行和列的数据并没有被刮取。我不知道为什么。
该网页是包含债券信息的网页
输入代码:
a = driver.find_elements(By.TAG_NAME,'sgx-table-row')
combined=[]
for num in range(len(a)):
combined.append([])
counter=0
for item in a:
ticker = item.find_elements(By.TAG_NAME,'a')
name = item.find_elements(By.TAG_NAME,'sgx-table-cell-text')
price1 = item.find_elements(By.TAG_NAME,'sgx-table-cell-number')
for item in ticker:
if len(item.text) != 0:
combined[counter].append(item.text)
else:
pass
for item in name:
if len(item.text) !=0:
combined[counter].append(item.text)
else:
pass
for item in price1:
if len(item.text) != 0:
combined[counter].append(item.text)
else:
pass
counter+=1
df = pd.DataFrame(combined)
print(df)
输出代码:
N518100E 230201 CMHS 99.000 99 0.827 98.173 ﹣ ﹣ 0
1 N519100A 240201 LSHS 97.000 97 0.945 96.055 ﹣ ﹣ 0
2 N520100A 251101 QGES ﹣ ﹣ 0.111 ﹣ ﹣ ﹣ 0
3 N521100V 261101 IRRS ﹣ ﹣ 0 ﹣ ﹣ ﹣ 0
4 NA12100N 420401 PH1S 110.000 110 0.842 109.158 ﹣ ﹣ 0
5 NA16100H 460301 BJGS 108.000 108 1.069 106.931 ﹣ ﹣ 0
6 NA20100F 500301 ZL8S 108.000 108 0.729 107.271 ﹣ ﹣ 0
7 NA21200W 511001 ZFGS 87.000 87 0 87 ﹣ ﹣ 0
8 NX13100H 230701 R1MS 101.500 101.5 0.157 101.343 ﹣ ﹣ 0
9 NX15100Z 250601 AFUS 99.701 99.701 0.331 99.37 ﹣ ﹣ 0
10 NX16100F 260601 BJHS 102.000 102 0.296 101.704 ﹣ ﹣ 0
11 NX18100A 280501 CMGS 90.000 90 0.585 89.415 ﹣ ﹣ 0
12 NX21100N 310701 RXYS ﹣ ﹣ 0.093 ﹣ ﹣ ﹣ 0
13 NY07100X 220901 7PMS 101.380 101.38 1.214 100.166 ﹣ ﹣ 0
14 None None None None None None None None None
15 None None None None None None None None None
16 None None None None None None None None None
17 None None None None None None None None None
18 None None None None None None None None None
19 None None None None None None None None None
20 None None None None None None None None None
21 None None None None None None None None None
22 None None
如图所示,超过某一点后,find_all方法返回None,即使网页中的html代码的格式相同(类名和标记相同(。
页面是动态加载的,它调用了几个API,从中接收一些json数据。下面的结果对你有帮助吗?
import requests
import pandas as pd
r = requests.get('https://api.sgx.com/securities/v1.1/bonds?params=nc%2Cadjusted-vwap%2Cbond_accrued_interest%2Cbond_clean_price%2Cbond_dirty_price%2Cbond_date%2Cb%2Cbv%2Cp%2Cc%2Cchange_vs_pc%2Cchange_vs_pc_percentage%2Ccx%2Ccn%2Cdp%2Cdpc%2Cdu%2Ced%2Cfn%2Ch%2Ciiv%2Ciopv%2Clt%2Cl%2Co%2Cp_%2Cpv%2Cptd%2Cs%2Csv%2Ctrading_time%2Cv_%2Cv%2Cvl%2Cvwap%2Cvwap-currency')
df = pd.DataFrame(r.json()['data']['prices'])
df
这返回一个38行×34列的数据帧:
pv bond_dirty_price lt fn trading_time dp type du bv dpc ... p_ p bond_accrued_interest change_vs_pc s nc cx vl v bond_date
0 1.014 101.4 1.014 None 20220722_090753 None retailbonds None 45.0 None ... X 0.000 0.501 None 1.019 RMRB 0.0 45.0 45630.0 1658419200000
1 0.998 99.7 0.997 None 20220722_090824 None retailbonds None 22.0 None ... X -0.100 0.380 None 1.000 5A1B 0.0 41.0 40874.0 1658419200000
2 0.964 96.3 0.963 None 20220722_090824 None retailbonds None 7.0 None ... X -0.104 1.068 None 0.966 6AZB 0.0 82.0 79028.0 1658419200000
3 1.013 101.3 1.013 None 20220722_090824 None retailbonds None 20.0 None ... X 0.000 0.678 None 1.015 V7AB 0.0 80.0 81040.0 1658419200000
4 1.011 101.3 1.013 None 20220722_090825 None retailbonds None 22.0 None ... X 0.198 0.983 None 1.013 V7BB 0.0 9.0 9117.0 1658419200000