我如何从这个html中只拉价格文本在bs4 python?



所以我正在建立一个网页刮板,我有麻烦,只从这个页面拉价格。巨蟒也在撤资550美元。我在找那41,991美元。html如下:

<div class="snapshot__body-content">
<div class="snapshot__col1">
<ul class="snapshot__details list-unstyled">
<li class="snapshot__details-price">
<sup>
$
</sup>
41,991
<!-- -->
<a class="btn-link snapshot__details-monthly hidden-xs hidden-sm" href="/vehicle/details/73082384">
<sup>
$
</sup>
<span>
550
</span>
/mo*
</a>

下面是bs4的当前代码。

try:
data["Price"] = item.find_all("li", {"class":"snapshot__details-price"})[0].text.replace("/mo*","")
except:
data["Price"] = None

using.contents

for data in soup.select(".snapshot__details-price"):
# for original html
print("$" + data.contents[1].strip())
# for formatted html above
# print("$" + data.contents[2].strip())

您可以尝试get_text()方法在rsplit()之后提取标签的内部文本以获得结果。

from bs4 import BeautifulSoup
import requests
response = """<div class="snapshot__body-content">
<div class="snapshot__col1">
<ul class="snapshot__details list-unstyled">
<li class="snapshot__details-price">
<sup>
$
</sup>
41,991


<!-- -->
<a class="btn-link snapshot__details-monthly hidden-xs hidden-sm" href="/vehicle/details/73082384">
<sup>
$
</sup>
<span>
550
</span>
/mo*


</a>
</li>
</ul>
</div>
</div>"""
soup = BeautifulSoup(response, 'lxml')
for data in soup.find_all('li',{"class":"snapshot__details-price"}):
print(data.get_text(strip=True).rsplit('$', maxsplit=1)[0])

最新更新