我在python上,我试图从下面的亚马逊网页获得价格数据($25.99)。
https://www.amazon.com/Guffercty-kred-Sublimation-Mechanical-Keyboard/dp/B09HWZQQZJ/ref=sr_1_14?crid=3UHD6OMRY6RYG&关键字= keycaps& qid = 1667444474,瞿= eyJxc2MiOiI4Ljc5IiwicXNhIjoiOC41OCIsInFzcCI6IjcuOTMifQ % 3 d % 3 d& sprefix =键帽% 2帽% 2 c275& sr = 8-14& th = 1
我使用了beautiful soup和selenium,但是我的selenium代码不起作用。
#with beautiful soup
import requests
from bs4 import BeautifulSoup
PRODUCT="https://www.amazon.com/Guffercty-kred-Sublimation-Mechanical-Keyboard/dp/B09HWZQQZJ/ref=sr_1_14?crid=3UHD6OMRY6RYG&keywords=keycaps&qid=1667444474&qu=eyJxc2MiOiI4Ljc5IiwicXNhIjoiOC41OCIsInFzcCI6IjcuOTMifQ%3D%3D&sprefix=keycap%2Caps%2C275&sr=8-14&th=1"
response = requests.get(PRODUCT,
headers={"Accept-Language":"ko,en-US;q=0.9,en;q=0.8,sv;q=0.7,ja;q=0.6",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"})
soup = BeautifulSoup(response.text, "html.parser")
price = float(soup.find(name="span", class_="a-offscreen").getText())
print(price)
上面的代码完美地为我工作,并返回价格。代码在提示符上打印$25.99
但是,下面使用selenium的代码不起作用。
#with selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("/my/chrome/driver/path/chromedriver")
driver = webdriver.Chrome(service=service)
driver.get(url="https://www.amazon.com/Guffercty-kred-Sublimation-Mechanical-Keyboard/dp/B09HWZQQZJ/ref=sr_1_14?crid=3UHD6OMRY6RYG&keywords=keycaps&qid=1667444474&qu=eyJxc2MiOiI4Ljc5IiwicXNhIjoiOC41OCIsInFzcCI6IjcuOTMifQ%3D%3D&sprefix=keycap%2Caps%2C275&sr=8-14&th=1")
price = driver.find_element(By.CSS_SELECTOR, 'span .a-offscreen')
print(price.text)
与bs4代码不同,selenium代码在提示符上没有显示任何内容。
我想"find_element(By.CSS_SELECTOR, 'span .a-offscreen')"在selenium中的工作原理与"find(name='span', class_'a-offscreen')"在bs4。
我也试过By。还有XPATH,但它也不能工作。我错过什么了吗?
您可能需要等待页面完成呈现。或者你找到了其他元素。我看到60个项目匹配那个选择器。
我会尝试一个选择器:div#corePrice_feature_div span .a-offscreen
然后等待该元素被显示并启用。https://www.selenium.dev/documentation/webdriver/waits/explicit-wait
如果您不想编写自己的lambda, Python有一个类用于一些基本的开箱即用等待。有一个text_to_be_present_in_element:https://www.selenium.dev/selenium/docs/api/py/webdriver_support/selenium.webdriver.support.expected_conditions.html?highlight=expected
这个类有很多span元素,所以最好通过父div:
使其特定于主项目price = WebDriverWait(driver, 2).until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "div#corePrice_feature_div span.a-offscreen")
)
)
print(price.get_attribute("textContent"))
价格。文本为空,为" Text "只返回可见数据,而(如类名所示)此字段位于屏幕外,可见文本来自表示符号、整数和分数的其他跨度的聚合