我正在尝试web抓取网站,我想忽略div类中的一些元素
r = requests.get(
f"https://www.ranger5g.com/forum/threads/pre-collision-assist.3239")
soup = BeautifulSoup(r.text, 'html.parser')
data=[]
for div in soup.findAll("div", class_="bbWrapper"):
try:
div.find('blockquote', class_="bbCodeBlock bbCodeBlock--expandable bbCodeBlock--quote").extract()
except AttributeError:
pass
try:
div.find('bbCodeBlock-content').extract()
except AttributeError:
pass
try:
div.find("aside", class_="message-signature").extract()
except AttributeError:
pass
result = [div.get_text(strip=True, separator=" ")]
data.append(result)
我的数据输出[2]应该给出如下
Subaru dealer by me uses an orange construction cone for demo. Find one and try it. Won’t hurt anything if it doesn’t work.
但它给出了之前信息中的内容。我怎么能忽略class_="消息签名"中的元素,我怎么能得到这个。提前感谢
import requests
from bs4 import BeautifulSoup
def Main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.findAll(
"div", class_="bbCodeBlock-expandContent")[2].get_text(strip=True)
print(target)
Main("https://www.ranger5g.com/forum/threads/pre-collision-assist.3239/")
输出:
Subaru dealer by me uses an orange construction cone for demo. Find one and try it. Won’t hurt anything if it doesn’t work.