如何提取 div 和类



我是Python的新手,我需要为我的第一个爬虫获取标题,ISBN,价格和发布日期。

<div class="col-md-7 col-sm-7">
<h4><a href="https://www.fadavis.com/product/anatomy-physiology-pocket-AP-jones-3">Pocket Anatomy and Physiology, 3rd Edition</a></h4>
<div>Shirley A. Jones</div>
<div>ISBN-13: 978-0-8036-5658-1</div>
<p class="price"> $39.95 (US)</p>
<div class="prd_lst">
<ul class="book_list">
</ul>
<div class="mobile_add_tocart">
<button type="button" class="addtocart" onclick="window.location.href='https://shoppingcart.fadavis.com/ShoppingCart/AddToCart?guid=74779e63-ccfb-454e-a6b9-b4e9f9a50793&amp;productid=10959&amp;applicationid=5'"> <span class="cart_icon sprite pull-left"></span>Add to Cart</button>
</div>

<div class="popover bottom Available_tooltip"><div class="arrow"></div>
<div class="popover-content">
<ul class="book_list">
</ul>
<div class="clearfix"></div>
</div>
</div>                
<div class="clearfix"></div>
</div>
<p>Publication Date: 10/12/2016</p>
<div class="available active">
<div class="available_icon sprite pull-left"></div>
Available</div>
</div>
import bs4
html = """
<div class="col-md-7 col-sm-7">
<h4><a href="https://www.fadavis.com/product/anatomy-physiology-pocket-AP-jones-3">Pocket Anatomy and Physiology, 3rd Edition</a></h4>
<div>Shirley A. Jones</div>
<div>ISBN-13: 978-0-8036-5658-1</div>
<p class="price"> $39.95 (US)</p>
<div class="prd_lst">
<ul class="book_list">
</ul>
<div class="mobile_add_tocart">
<button type="button" class="addtocart" onclick="window.location.href='https://shoppingcart.fadavis.com/ShoppingCart/AddToCart?guid=74779e63-ccfb-454e-a6b9-b4e9f9a50793&amp;productid=10959&amp;applicationid=5'"> <span class="cart_icon sprite pull-left"></span>Add to Cart</button>
</div>

<div class="popover bottom Available_tooltip"><div class="arrow"></div>
<div class="popover-content">
<ul class="book_list">
</ul>
<div class="clearfix"></div>
</div>
</div>
<div class="clearfix"></div>
</div>
<p>Publication Date: 10/12/2016</p>
<div class="available active">
<div class="available_icon sprite pull-left"></div>
Available</div>
</div>
</div>
"""
soup=bs4.BeautifulSoup(html,'lxml')
div = soup.find('div', {'class': 'col-md-7'})
divs = div.findAll('div')
price = div.find('p', {'class': 'price'})
date = div.findAll('p')
print(divs[0].text)
print(divs[1].text)
print(price.text)
print(date[-1].text)

输出

Shirley A. Jones
ISBN-13: 978-0-8036-5658-1
$39.95 (US)
Publication Date: 10/12/2016

最新更新