如何用python抓取嵌套的标签元素



hi我想获取下面的一些数据<del>或者<ins>标签,但我找不到任何解决方案,任何人都有这个想法,有没有什么短的方法来获得这些信息

这是我的python代码

import requests
import json
from bs4 import BeautifulSoup

header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'}

base_url = "https://www.n11.com/super-firsatlar"

r = requests.get(base_url,headers=header)

if r.status_code == 200:
soup = BeautifulSoup(r.text, 'html.parser')
books = soup.find_all('li',attrs={"class":"column"})

result=[]
for book in books:
title=book.find('h3').text
link=base_url +book.find('a')['href']
picture = base_url + book.find('img')['src']

price = soup.find('a',attrs={"class":"ins"})



single ={'title':title,'link':link,'picture':picture,'price':price}
result.append(single)
with open('book.json','w', encoding='utf-8') as f:
json.dump(result ,f,indent=4,ensure_ascii=False)
else:
print(r.status_code)

<div class="proDetail">
<a href="https://test.com"class="oldPrice" title="Premium">  

<del>69,00 TL</del></a>

<a href="https://test.com"class="newPrice" title="Premium">

<ins>14,90</ins>

</a>
</div>

这是我的输出

{
"title": "Premium",
"link": "https://test.com",
"picture": "https://pic.gif",
"price": null
},

您正在搜索错误的类。首先搜索类"newPrice"以获得a-block,使用:

price = book.find('a', attrs={'class': 'newPrice'}) 

然后,您可以在a-block中搜索-ins blocklike:

price = book.find('a', attrs={'class': 'newPrice'}).find('ins')

然后你的结果会看起来像:

<ins>14,90</ins>

对于最终结果,剥去html标签:

price = book.find('a', attrs={'class': 'newPrice'}).find('ins').text.strip() 

最新更新