当我尝试使用以下代码了解Sony
耳机的标题时,代码的结果是None
。
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/Sony-Noise-Cancelling-Headphones-
WH1000XM3/dp/B07G4MNFS1/ref=sxin_0_ac_d_rm?ac_md=0-0-c29ueQ%3D%3D-
ac_d_rm&keywords=sony&pd_rd_i=B07G4MNFS1&pd_rd_r=3e6d5325-8ee4-4ba8-a84f-
1b7cf2ce98bf&pd_rd_w=BVSFq&pd_rd_wg=I0LMZ&pf_rd_p=e2f20af2-9651-42af-9a45-
89425d5bae34&pf_rd_r=VGT25BXXZNDE3B61A994&psc=1&qid=1577253649&smid=ATVPDKIKX0DER'
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/79.0.3945.88 Safari/537.36"}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
soup.prettify()
#print(soup)
title = soup.find_all('span', {'id':'productTitle'})
print(title, len(title))
电流输出为:
[ ] 0
我花了最后两个小时试图用BeautifulSoup来刮这个标题。我尝试抓取页面上的其他元素。没有成功。我尝试将原始内容发送到文件,但由于存在奇怪的字符而中断。
我尝试了艾哈迈德的答案,仍然没有得到。我尝试了一堆我在网上找到的其他解决方案,但仍然没有得到。我一辈子都想不出如何使用BeautifulSoup
来刮这个。
我知道你使用硒,所以这里是硒溶液。
from selenium import webdriver
bot = webdriver.Chrome()
bot.get("https://www.amazon.com/Sony-Noise-Cancelling-Headphones-WH1000XM3/dp/B07G4MNFS1/ref=sxin_0_ac_d_rm?ac_md=0-0-c29ueQ==-ac_d_rm&keywords=sony&pd_rd_i=B07G4MNFS1&pd_rd_r=3e6d5325-8ee4-4ba8-a84f-1b7cf2ce98bf&pd_rd_w=BVSFq&pd_rd_wg=I0LMZ&pf_rd_p=e2f20af2-9651-42af-9a45-89425d5bae34&pf_rd_r=VGT25BXXZNDE3B61A994&psc=1&qid=1577253649&smid=ATVPDKIKX0DER")
title = bot.find_element_by_id('productTitle').text
print(title)
bot.close()
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.amazon.com/Sony-Noise-Cancelling-Headphones-WH1000XM3/dp/B07G4MNFS1/ref=sxin_0_ac_d_rm?ac_md=0-0-c29ueQ==-ac_d_rm&keywords=sony&pd_rd_i=B07G4MNFS1&pd_rd_r=3e6d5325-8ee4-4ba8-a84f-1b7cf2ce98bf&pd_rd_w=BVSFq&pd_rd_wg=I0LMZ&pf_rd_p=e2f20af2-9651-42af-9a45-89425d5bae34&pf_rd_r=VGT25BXXZNDE3B61A994&psc=1&qid=1577253649&smid=ATVPDKIKX0DER")
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll("span", {'id': 'productTitle'}):
print(item.get_text(strip=True))
输出:
Sony Noise Cancelling Headphones WH1000XM3: Wireless Bluetooth Over the Ear Headphones with Mic and Alexa voice control - Industry Leading Active Noise Cancellation - Black
在线运行代码:单击此处