为什么scrapy硒没有得到任何数据?



我正在尝试抓取"产品详细信息(一个表)";和"请选择一个大小(JavaScript按钮类型)"部分从这个基于JS的网页https://www.breuninger.com/de/damen/luxus/bekleidung-jacken-maentel/。我使用scrapy-selenium来抓取这个网页。这段代码能够刮除这02节以外的所有内容。我只使用硒检查了它,并得到了结果。但不是用痒硒。我也用了scrapy-splash,但它甚至不能渲染整个页面。我已经查了之前的问题,但是没有找到答案。我到底哪里做错了?

import scrapy
from scrapy_selenium import SeleniumRequest
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time
class ProductsSpider(scrapy.Spider):
name = 'products'
allowed_domains = ['www.breuninger.com']

def start_requests(self):
options = webdriver.ChromeOptions()
options.headless = True
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.set_window_size(1920, 1080)
driver.get("https://www.breuninger.com/de/damen/luxus/bekleidung-jacken-maentel/")
time.sleep(5)
banner_btn = driver.find_element(By.XPATH, "//div[@class='banner-actions-container']/button")
banner_btn.click()
time.sleep(3)
links = driver.find_elements(By.XPATH, "//suchen-produktliste[@id='produktliste']/section/div/suchen-produkt/div/a")

for link in links:
href= link.get_attribute('href')
yield SeleniumRequest(
url = href,
callback= self.parse,
wait_time=1
)

driver.quit()
return super().start_requests()
def parse(self, response):
yield {
'Bold-title' : response.xpath("(//span[@itemprop='name'])[1]/text()").get(),
'Price' : response.xpath("//div[@itemprop='offers']/span/text()").get(),
'Beschreibung': response.xpath("//div[@class='bewerten-textformat--produktdetails-detail']/div/ul/li/text()").getall()
}

你真的不需要selenium的重炮在这里获得产品的详细信息,如价格,描述和品牌。

你可以试试:

import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
url = "https://www.breuninger.com/de/damen/luxus/bekleidung-jacken-maentel/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36",
}
soup = (
BeautifulSoup(
requests.get(url, headers=headers).text,
"lxml",
).select(".suchen-produkt a")
)
products = [
[
i.select_one(".suchen-produkt__marke").getText(),
i.select_one(".suchen-produkt__name").getText(),
i.select_one(".suchen-produkt__preis").getText(),
] for i in soup
]
df = pd.DataFrame(products, columns=["Brand", "Description", "Price"])
df.to_csv("products.csv", index=False)
print(tabulate(df, headers="keys", tablefmt="grid"))

这应该给你一个这样的表(沿着一个.csv文件)。

+----+-------------------------+--------------------------------------------------------------+------------------+
|    | Brand                   | Description                                                  | Price            |
+====+=========================+==============================================================+==================+
|  0 | BURBERRY                | Jacke BINHAM                                                 | 1.549,99 €       |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  1 | BURBERRY                | Trenchcoat KENSINGTON                                        | 1.849,99 €       |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  2 | RALPH LAUREN Collection | Blouson mit Schmucksteinen                                   | 2.050 €          |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  3 | BURBERRY                | Trenchcoat KENSINGTON                                        | 1.849,99 €       |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  4 | BURBERRY                | Trenchcoat WATERLOO                                          | 1.889,99 €       |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  5 | BURBERRY                | Trenchcoat ISLINGTON                                         | 1.849,99 €       |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  6 | BURBERRY                | Trenchcoat WATERLOO                                          | 1.889,99 €       |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  7 | MONCLER                 | Daunenweste LIANE                                            | 495 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  8 | BURBERRY                | Trenchcoat                                                   | 1.849,99 €       |
+----+-------------------------+--------------------------------------------------------------+------------------+
|  9 | MONCLER                 | Jacke im Materialmix                                         | 650 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 10 | MONCLER                 | Jacke AGDE                                                   | 695 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 11 | MONCLER                 | Jacke CECILE                                                 | 520 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 12 | MONCLER                 | Jacke TIYA                                                   | 695 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 13 | MONCLER                 | Daunenweste LIANE                                            | 495 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 14 | MONCLER                 | Daunenparka HERMANVILLE                                      | 1.250 €          |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 15 | BURBERRY                | Trenchcoat KENSINGTON                                        | 999,99 €         |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 16 | MONCLER                 | Jacke AGDE                                                   | 695 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 17 | MONCLER                 | Daunenweste ALPISTE                                          | 750 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 18 | MONCLER                 | Regenmantel HIENGU                                           | 735 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 19 | MONCLER                 | Jacke TIYA                                                   | 695 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
| 20 | MONCLER                 | Jacke HOULGATE                                               | 780 €            |
+----+-------------------------+--------------------------------------------------------------+------------------+
and more ...

p。最后一个XPath在该页上不起作用,因此得到的是空列表。

相关内容

  • 没有找到相关文章

最新更新