我试图从一个25页的网站上抓取特定信息,但当我运行代码时,我会得到空列表。我的输出应该是一本收集了特定信息的字典。请提供任何帮助,我们将不胜感激。
# Loading libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import mitosheet
# Assigning column names using class_ names
name_selector = "af885_1iPzH"
old_price_selector = "f6eb3_1MyTu"
new_price_selector = "d7c0f_sJAqi"
discount_selector = "._6c244_q2qap"
# Placeholder list
data = []
# Looping over each page
for i in range(1,26):
url = "https://www.konga.com/category/phones-tablets-5294?brand=Samsung&page=" +str(i)
website = requests.get(url)
soup = BeautifulSoup(website.content, 'html.parser')
name = soup.select(name_selector)
old_price = soup.select(old_price_selector)
new_price = soup.select(new_price_selector)
discount = soup.select(discount_selector)
# Combining the elements into a zipped list to be able to pull the data simultaneously
for names, old_prices, new_prices, discounts in zip(name, old_price, new_price, discount):
dic = {"Phone Names": names.getText(),"New Prices": new_prices.getText(),"Old Prices": old_prices.getText(),"Discounts": discounts.getText()}
data.append(dic)
data
我测试了下面的内容,它可以为我获得40个名称值。
我不能用漂亮的汤来获得价值,而是直接通过硒来获得。
如果你决定像我一样使用Chrome和PyCharm,那么:
打开Chrome。点击右上角附近的三个点。单击"设置",然后单击"关于Chrome"查看您的Chrome版本。在此处下载相应的驱动程序。将驱动程序保存在PyCharm PATH文件夹中
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
# Assigning column names using class_ names
name_selector = "af885_1iPzH"
# Looping over each page
for i in range(1, 27):
url = "https://www.konga.com/category/phones-tablets-5294?brand=Samsung&page=" +str(i)
driver.get(url)
xPath = './/*[@class="' + name_selector + '"]'
name = driver.find_elements(By.XPATH, xPath)