如何打印"日期不可用";像我通常在网页上做的那样,错过了与熊猫的争吵?这是我的代码,它给了我两种不同的输出:
from selenium import webdriver
browser = webdriver.Chrome(executable_path="./drivers/chromedriver")
browser.get('https://www.ebay.it/sch/i.html?_from=R40&_nkw=3060&_sacat=0&_sop=15')
import time
time.sleep(2)
cookie = browser.find_element_by_id("gdpr-banner-accept").click()
page_source = browser.page_source
from bs4 import BeautifulSoup
soup = BeautifulSoup(page_source)
soup = BeautifulSoup(browser.page_source)
results = (soup.find("ul", {"class": "srp-results"}))
results.findAll("li")
items = results.findAll("li")
rows = []
for item in items:
titleElement = item.find("h3")
priceElement = item.find("span", {"class": "s-item__price"})
dateElement = item.find("span", {"class": "s-item__time-end"})
if titleElement:
newListingTag = titleElement.find("span", {"class": "LIGHT_HIGHLIGHT"})
if newListingTag:
newListingTag.extract()
print(titleElement.text)
if priceElement:
print(priceElement.text)
if dateElement:
print(dateElement.text)
else:
print("The date is not available")
results.findAll("li")
items = results.findAll("li")
rows = []
for item in items:
titleElement = item.find("h3")
priceElement = item.find("span", {"class": "s-item__price"})
dateElement = item.find("span", {"class": "s-item__time-end"})
if titleElement and priceElement and dateElement:
newListingTag = titleElement.find("span", {"class": "LIGHT_HIGHLIGHT"})
if newListingTag:
newListingTag.extract()
row = [dateElement.text, titleElement.text, priceElement.text]
rows.append(row)
import pandas as pd
pd.set_option('expand_frame_repr', False)
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.max_colwidth", None)
pd.set_option('display.max_colwidth', -1)
df = pd.DataFrame.from_records(rows, columns=["Purchase Date", "Title", "Price"])
print(df)
这是第一个输出:
NUOVO MSI GeForce RTX 3060 12GB GPU
EUR 488,25
(Martedì, 13:02)
GIGABYTE GeForce RTX 3060 Ti OC 8GB Eagle
EUR 499,88
(Domenica, 14:47)
MSI GTX 1650 VENTUS XS 4G OC Graphics card GF GTX 1650 4 GB GDDR5 V809-3060R
EUR 532,50
The date is not available
...
ecc。ecc。(它包括大约50个元素(
另一方面,这是第二个输出(与第一个输出相比,它缺少大约34个元素,为什么?(
0 (Martedì, 13:02) NUOVO MSI GeForce RTX 3060 12GB GPU
EUR 488,25
1 (Domenica, 14:47) GIGABYTE GeForce RTX 3060 Ti OC 8GB Eagle
EUR 499,88
2 (Giovedì, 22:11) Inno 3d GeForce RTX 3060 TWIN x2 OC 12gb GDDR 6 GPU neu&ovp - spedizione veloce ✅ EUR 510,00
...
ecc。ecc。
此外,我注意到第二个输出只显示具有确切日期的行。
谢谢你的回答,很抱歉我的英语不好
我对Selenium不是很熟悉,但在查看了该网站后,似乎并不是所有列表都有s-item__time-end
类。要包含的另一个标签可能是:s-item__purchase-options-with-icon
,它允许一口价和其他变体。只需添加另一个变量,如下所示:
dateElement = item.find("span", {"class": "s-item__time-end"})
altDateElement = item.find("span", {"class": "s-item__purchase-options-with-icon"})
对于原始设置,如果dateElement
为None,则验证检查可能失败,如下行所示:
if titleElement and priceElement and dateElement
您也可以使用dateElement
尝试一个空字符串缓冲区,看看是否允许空白结果。
dateElement = item.find("span", {"class": "s-item__time-end"}) or ""