为什么我的webscraper无法从彪马网站上抓取数据



我正在写一个脚本,从彪马的官方网站上抓取产品详细信息-https://in.puma.com/in/en/home

脚本运行时没有任何错误。然而,它只是不抓取详细信息,并返回空值。我以前使用xpath,但我对代码进行了更改,并使用了类名来抓取产品价格、标题、颜色等数据。这是我的代码:

def puma_product_scraper():
titles = []
prices = []
p_colors = []
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
options.add_argument("window-size=1200x600")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
browser = webdriver.Chrome(ChromeDriverManager().install(), options=options)
browser.get("https://in.puma.com/in/en/search?q=running&originalphrase=running")
browser.implicitly_wait(10)
# Begin scraping elements
# browser.find_element_by_class_name('btn').click()
product_cards_container = browser.find_element(By.CLASS_NAME, "product-grid-container")
product_cards = product_cards_container.find_elements(By.CLASS_NAME, "col-lg-3 col-sm-4 col-md-4 col-6")
for card in product_cards:
title = card.find_element(By.CLASS_NAME, "product-tile-info-text")
colors = card.find_element(By.CLASS_NAME, "product-tile-swatch-container")
price = card.find_element(By.CLASS_NAME, "product-tile-price-new product-tile__price--new")
titles.append(title.text)
p_colors.append(colors.text)
prices.append(price.text)
# Storing scraped content
df = pd.DataFrame({'Product Name': titles, 'Price': prices, 'Colors': p_colors})
df.to_csv('PUMA_Scraping.csv', index=False)
df.head()
print('Total items related to running on Puma:', len(titles))

这是我得到的输出:

Total items related to running on Puma: 0

如果有与类名有关的问题,那么它应该显示一条错误消息,但我没有收到任何这样的消息?问题到底在哪里?请帮忙。

编辑:正如@Joran建议的那样,为find_element应用变体。

wait = WebDriverWait(browser, 10)
# browser.find_element_by_class_name('btn').click()
product_cards_container = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "product-grid-container")))

仍然没有起作用。

在这种情况下,最好使用CSS_SELECTOR,我提取了标题,同样可以实现价格。

代码:

driver = webdriver.Chrome("C:\Users\Inc\Desktop\Selenium+Python\chromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://in.puma.com/in/en/search?q=running&originalphrase=running")
titles = []
prices = []
p_colors = []
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'product-grid-container')))
product_cards = driver.find_elements(By.CSS_SELECTOR, "div.row.product-grid.no-gutters div[data-grid-tile-wrapper]")
titles = driver.find_elements(By.CSS_SELECTOR, "a.product-tile-title.product-tile__title.pdp-link.line-item-limited")
for title in titles:
print(title.text)

相关内容

  • 没有找到相关文章

最新更新