需要帮助从HTML中提取4个数字在python



我现在已经写了代码,打开一个URL并提取HTML数据到htmlA

htmlA中,我试图提取4条信息

  1. <日期/gh>价格1
  2. 价格2

htmlA中嵌入这4条信息的部分如下所示:

<!-- TAB CONTENT -->rnttt<div class="fund-content tab-content span12">rnrntttt<!-- OVERVIEW -->rntttt<div class="tab-pane active" id="overview">rnttttt<h3 class="subhead tab-header">Overview</h3>rnttttt<div class="row-fluid">rntttttt<div class="span6">rnttttttt<p class="as-of-date">rntttttttt<span id="ContentPlaceHolder1_cph_main_cph_main_AsOfLabel">As of 9/24/2021</span>rnttttttt</p>rnrnttttttt<div class="table-wrapper">rntttttttt<div>rnt<table class="cefconnect-table-1 table table-striped" cellspacing="0" cellpadding="5" Border="0" id="ContentPlaceHolder1_cph_main_cph_main_SummaryGrid">rntt<tr class="tr-header">rnttt<th scope="col">&nbsp;</th><th class="right-align" scope="col">Share<br>Price</th><th class="right-align" scope="col">NAV</th><th class="right-align" scope="col">Premium/<br>Discount</th>rntt</tr><tr>rnttt<td>Current</td><td class="right-align">$19.14</td><td class="right-align">$21.82</td><td class="right-align">-12.28%</

在这个例子中,我想提取:

  1. 9/24/2021
  2. <
  3. 19.14美元/gh><
  4. 21.82美元/gh>

我试图使用BeautifulSoup来搜索和提取htmlA,但我很难挑选出我需要的信息的特定位(4)。有人能帮我弄一下这个代码吗?非常感谢!

我不能给你一个完整的答案,但我可以为你指出正确的方向。

您需要将html内容解析为BeautifulSoup对象,以便以python方式处理网页内容。一样,

from bs4 import BeautifulSoup
import requests

url = 'https://en.wikipedia.org/wiki/Elon_Musk'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'html.parser')

一旦你有了var soup,你就可以调用各种方法,比如

print(soup.div) 

它会给你一个div元素,等等

更多的例子:

soup.title
# <title>The Dormouse's story</title>
soup.title.name
# u'title'
soup.title.string
# u'The Dormouse's story'
soup.title.parent.name
# u'head'
soup.p
# <p class="title"><b>The Dormouse's story</b></p>
soup.p['class']
# u'title'
soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

从https://www.crummy.com/software/BeautifulSoup/bs4/doc/

相关内容

最新更新