从web检索数据



我有这样的代码:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_BV= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
elementHTML=driver_BV.find_element("class name", 'productInfoWrapper')
Final=[]
children_element=elementHTML.find_elements("class name", 'plContent')
print('''
a. Retrieve data
b. Create the graph
c. Display the matrix
d. Save to Excel file
e. Exit
''')
while True:
select_option_BV = input("Select option:")
if select_option_BV == 'a':
for child_element in children_element:
title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
product_BV.append(title)
titlu=child_element.find_element("class name", 'priceSell')
price=titlu.get_attribute('innerText')
price_BV.append(price)
print ('Products:', product_BV)
print ('Prices:', price_BV)
price=price.replace("$","")
Final.append(float(price))
Product_title_series=pd.Series(product_BV)
Product_price_series=pd.Series(Final)
product_rows={"Product name":Product_title_series, "Price":Product_price_series}
Product_Matrix_Framework=pd.DataFrame(product_rows)
elif select_option_BV == 'b':
Product_Matrix_Framework.plot(x="Product name",y="Price")
elif select_option_BV == 'c':
print(Product_Matrix_Framework.sort_values("Price"))
elif select_option_BV == 'd':
Product_Matrix_Framework.to_excel("Products.xlsx")
elif select_option_BV == 'e':
print("CY@ exiting...")
break

我不知道我做错了什么,但我不能使它工作!我需要它来完成大学的一个项目,但我现在被它卡住了,我不知道我做错了什么,当我在写"a"的时候。在控制台不做任何事情,如果我写任何其他字母说:&;name 'Product_Matrix_Framework' is not defined&;请帮助!谢谢你。

看了一些文档网站和网站本身(我假设你想要CSS选择器productDesc的元素),我想我看到你想做什么。

如果你想选择一个元素的CSS选择器(productDesc在这个例子中是一个CSS选择器),你应该使用:

title_elements = child_element.find_elements_by_css_selector("productDesc")

应该返回一个包含CSS选择器productDesc的所有子元素的数组,然后您可以遍历该数组以获得每个元素的文本。比如:

titles = []
for title_element in title_elements:
titles.append(title_elements.get_attribute("innerHtml")

查看网站,每个child_element可能有一个或多个具有productDescCSS选择器的元素,因此您应该将这些元素存储在数组中,以防有多个。您的代码似乎假设只有一个。

例如:


...
while True:
select_option_BV = input("Select option:")
if select_option_BV == 'a':
for child_element in children_element:
titles = []
for title_element in child_element.find_elements_by_css_selector("productDesc"):
titles.append(title_element.get_attribute("innerText"))
product_BV.append(titles)  # Product_BV will now be an array containing more arrays
...

我认为elementHTML是不必要的。您只需通过搜索css选择器来使用children_element

...
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
Final=[]
children_element=driver_BV.find_elements_by_css_selector(".plContent .galleryItem")

它将找到60项货物。之后,错误不断发生,我认为你需要在for循环中修复这些错误。

@Nathcat

from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_RA= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_RA.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_RA=[]
price_RA=[]
elementHTML=driver_RA.find_element("class name", 'plContent')
Final=[]
children_element=elementHTML.find_elements("class name", 'productInfoWrapper')
print('''
a. Retrieve data
b. Create the graph
c. Display the matrix
d. Save to Excel file
e. Exit
''')
while True:
select_option_RA = input("Select option:")
if select_option_RA == 'a':
for child_element in children_element:
title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
#Trying to print every SECOND string from productDesc (because for some reason every
#first innerText from productDesc is empty 
product_RA.append(title)
titlu=child_element.find_element("class name", 'priceSell')
price=titlu.get_attribute('innerText')
price_RA.append(price)
print ('Products:', product_RA)
print ('Prices:', price_RA)
price=price.replace("€","")
Final.append(price)
Product_title_series=pd.Series(product_RA)
Product_price_series=pd.Series(Final)
product_rows={"Product name":Product_title_series, "Price":Product_price_series}
Product_Matrix_Framework=pd.DataFrame(product_rows)
elif select_option_RA == 'b':
Product_Matrix_Framework.plot(x="Product name",y="Price")
elif select_option_RA == 'c':
print(Product_Matrix_Framework.sort_values("Price"))
elif select_option_RA == 'd':
Product_Matrix_Framework.to_excel("Products.xlsx")
elif select_option_RA == 'e':
print("Exiting beep boop beep.")
break```
I did this, but I can't get the products, if I run it and type "a", it shows the prices but not the products

最新更新