巨蟒美汤.只返回表的第一行



刚开始学习Python & &;美丽的汤。试图从一些地方刮走河水。大多数网站都很成功,但有一个网站给我带来了问题。网址是http://hydro.marlborough.govt.nz/reports/riverreport.html。我正试图从主表中获取第24行数据。

下面的代码似乎选择了表,但只返回标题&第一行。

tMain_table = soup.select_one("table:nth-of-type(1)")
print (tMain_table)
<table class="table table-striped table-bordered table-hover">
<thead style="background-color: #4d4c4f;color: white;">
<tr>
<th class="text-center">Site Name</th>
<th class="text-center"><div>Date/Time </div>(NZST)</th>
<th class="text-center"><div>Flow</div>(m3/s)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day</div>Peak Flow</th>
<th class="text-center"><div>Stage</div>(m)</th>
<th class="text-center"><div>Change</div>(mm/hr)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day </div>Peak Stage</th>
<th class="text-center"><div>Peak</div>Date/Time</th>
</tr>
</thead>
<tbody>
<tr ng-repeat="item in data ">
<td nowrap="nowrap">{{item.SiteName}}</td>
<td class="text-center" nowrap="nowrap">{{item.LastUpdate | asDate | date:'d MMM yy         HH:mm'}} </td>
<td class="text-center" nowrap="nowrap">{{item.Flow}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakFlow}}</td>
<td class="text-center" nowrap="nowrap">{{item.Stage}}</td>
<td class="text-center" nowrap="nowrap">{{item.StageChange}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStage}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStageDate | asDate | date:'d MMM yy HH:mm'}}</td>
</tr>
</tbody>
</table>

同样,下面的代码也只返回第一行。

table = soup.findAll('tr')
print (table)
[<tr>
<th class="text-center">Site Name</th>
<th class="text-center"><div>Date/Time </div>(NZST)</th>
<th class="text-center"><div>Flow</div>(m3/s)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day</div>Peak Flow</th>
<th class="text-center"><div>Stage</div>(m)</th>
<th class="text-center"><div>Change</div>(mm/hr)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day </div>Peak Stage</th>
<th class="text-center"><div>Peak</div>Date/Time</th>
</tr>, <tr ng-repeat="item in data ">
<td nowrap="nowrap">{{item.SiteName}}</td>
<td class="text-center" nowrap="nowrap">{{item.LastUpdate | asDate | date:'d MMM yy HH:mm'}} </td>
<td class="text-center" nowrap="nowrap">{{item.Flow}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakFlow}}</td>
<td class="text-center" nowrap="nowrap">{{item.Stage}}</td>
<td class="text-center" nowrap="nowrap">{{item.StageChange}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStage}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStageDate | asDate | date:'d MMM yy HH:mm'}} 
</td>
</tr>]

感谢您的帮助

可以呈现动态内容的网页,例如,使用Selenium

一个最小的例子

from bs4 import BeautifulSoup
from selenium import webdriver
url = # 
with webdriver.Firefox() as driver: # there are other drivers available
driver = webdriver.Firefox()
# driver.implicitly_wait(10)
rendered_wpage = driver.get(url).page_source

soup = BeautifulSoup(rendered_wpage, 'lxml')
# scrape here

最新更新