如何使用BeautifulSoup在python中从网站中卸载的选项卡中抓取表数据



我正试图从这个网站上抓取索引的数据。我正试图从索引选项卡中抓取滚动数据,但当我抓取表时,其内容显示如下:

<table cellspacing="0" class="derivatives_section table table-striped responsive dt-responsive nowrap derivatives_rollover_tbl" id="rollover_index_table" width="100%">
<thead>
<tr>
<th>Index</th>
<th>Future<br/> Price</th>
<th>% Price<br/> Chg.</th>
<th>% OI<br/> Chg.</th>
<th>No. of Shares<br/> Rolled</th>
<th>% Rollover</th>
<th id="ro_idx_1">% Chg Rollover <br/> Vs. 1 Month Avg.</th>
<th>% Rollover <br/>Cost </th>
<th id="ro_idx_2">% Chg Rollover Cost <br/> Vs. 1 Month Avg.</th>
</tr>
</thead>
<tbody>
<tr>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
</tr>
<tr>

以下是产生与上面相同结果的代码:

import requests
import json
import time
from bs4 import BeautifulSoup
url = 'https://www.indiainfoline.com/markets/derivatives/rollover#derivatives_index'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
request = requests.get(url,headers=headers)
soup = BeautifulSoup(request.text,'html.parser')
table = soup.find('table',{'id':'rollover_index_table'})
tbody = table.find('tbody')
tr = tbody.find('tr')
td = tr.find_all('td')
print(td)

如何抓取网站的索引选项卡数据?

数据来自返回json的API调用。您可以如下创建数据的数据帧:

import requests
import pandas as pd
r = requests.get('https://www.indiainfoline.com/api/papi-call-api.php?url=/Derivative/Derivative.svc/FNO-Rollover/FUTSTK/?responsetype=json').json()
df = pd.DataFrame(r['response']['data']['FNORollOverList']['FNORollOverdata'])
print(df)

只是解释@QHarr做了什么。此网站的内容是动态生成的。这意味着内容通过JavaScript通过Json文件呈现。你可以看到,当你用Bs4提出请求时,数据没有加载,这就是为什么你无法检索它。

<div class="bs-component deri_roll_main">
<div class="row">
<div class="col-sm-6 col-xs-12">
<ul class="nav nav-tabs mb0">
<li id="stk_tab" class="active"><a href="#derivatives_stock" data-toggle="tab">Stock</a></li>
<li id="idx_tab"><a href="#derivatives_index" data-toggle="tab">Index</a></li>
</ul>
</div>
<div class="clearfix hidden visible-xs gray_bdr_b"></div>
<div class="col-sm-6 col-xs-12 txt_left_m text-right">
<div class="fill_exp_date w100p"><span>Expiry Date -</span> </div>
</div>
</div>
<div id="myTabContent" class="tab-content">
<div class="tab-pane fade active in" id="derivatives_stock">
<!-- <table id="derivatives_rollover_tbl" class="derivatives_rollover_tbl display nowrap" style="width:100%">-->
<div class="tablepanel">
<table class="derivatives_section table table-striped  responsive dt-responsive nowrap derivatives_rollover_tbl" cellspacing="0" width="100%" id="rollover_stock_table">
<thead>
<tr>
<th>Script</th>
<th >Future<br> Price</th>
<th>% Price<br> Chg</th>
<th>% OI<br> Chg</th>
<th>No. of Shares<br> Rolled</th>
<th>% Rollover</th>
<th id="ro_stk_1">% Chg Rollover <br> VS 1 Month.Avg</th>
<th>RO<br>Cost </th>
<th id="ro_stk_2">% Chg Rollover <br> VS 1 Month.Avg</th>
</tr>
</thead>
<tbody>
<tr>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
</tr>

处理这种情况的一种方法,也是这种情况下最好的方法是直接从API调用获取数据。但这并不总是可能的。第二种方法是使用另一种支持Javascript的工具,它将为您呈现这些数据,比如Selenium或Scrapy with Splash。

最新更新