我正试图了解某个产品的价格变化,但硒或Beautifulsoup的结果喜忧参半。
from selenium import webdriver
from bs4 import BeautifulSoup
import re
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://www[.]cdkeys.com[/]playstation-network-psn[/]playstation-plus[/]1-year-playstation-plus-membership-ps3-ps4-ps-vita-digital-code')
search = driver.find_element_by_xpath('.//span[@class="price"]')
soup = BeautifulSoup(driver.page_source,'html.parser')
price = soup.find_all('span',{'class':['price']})
搜索返回某种类型的对象,但具有空文本属性。
price得到几十个结果,包括我感兴趣的结果。我相信它在某种API后面,但我无法使用按XHR排序的开发工具找到它。
首先,在这种情况下,你不需要将美味的汤和硒结合在一起,其中一种就足以完成整个工作。
我会选择beautifulSoup(有要求(
原因-这种抓取不需要javascript就性能而言,beautifulSoup比硒轻得多。
关于scrape方法-
您得到了几十个结果,因为您只根据元素的类名搜索元素,并且该名称中有许多元素
解决方案之一是组合多个
属性以找到正确的元素,就像我在下面的代码中所做的那样。
from bs4 import BeautifulSoup as BS
import requests
url = "https://www.cdkeys.com/playstation-network-psn/playstation-plus/1-year-playstation-plus-membership-ps3-ps4-ps-vita-digital-code?mw_aref=xcalibur"
r = requests.get(url)
soup = BS(r.text, features='html.parser')
product_main = soup.find('div', {'class': 'product-info-main'})
product_price = product_main.find('span', {'data-price-type': 'finalPrice', 'class': 'price-wrapper'})
print(product_price.text)
您应该在获取带有Selenium的元素之前添加一个等待,以使元素完全加载。
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re
driver = webdriver.Chrome(executable_path='chromedriver.exe')
wait = WebDriverWait(driver, 20)
driver.get('https://www[.]cdkeys.com[/]playstation-network-psn[/]playstation-plus[/]1-year-playstation-plus-membership-ps3-ps4-ps-vita-digital-code')
search = wait.until(EC.presence_of_element_located((By.XPATH, './/span[@class="price"]')))
time.sleep(0.5)
prices_in_usd = driver.find_elements_by_xpath("//span[@class='price' and contains(text(),'$')]")
soup = BeautifulSoup(driver.page_source,'html.parser')
price = soup.find_all('span',{'class':['price']})
如果你想获得元素文本,不要忘记提取文本
search_text = search.text