我有下面的脚本来从iHerbs中抓取数据。然而,即使我把driver.close()
放在第24个项目之后,它也可以停止,但它仍然在抓取数据,不会停止。
在完成第24项后停止循环并关闭浏览器的任何解决方案。
非常感谢!
请检查以下脚本:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(chrome_path)
driver.get("https://ca.iherb.com/c/Vitamins?noi=24")
wait = WebDriverWait(driver, 10)
#close the pop up
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"svg[data-ga-event-action='list-close']"))).click()
#store all the links in a list
item_links = [item.get_attribute("href") for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".absolute-link-wrapper > a.product-link")))]
review_titles= list()
review_contents = list()
product_helpful= list()
product_not_helpful = list()
member_rating = list()
total_rate = list()
#iterate over the links
for item_link in item_links:
driver.get(item_link)
#locate and click on the `View All Reviews` link
all_reviews_link = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"span.all-reviews-link > a")))
x = all_reviews_link.get_attribute("href")
MAX_PAGE_NUM = 2
for i in range(1, MAX_PAGE_NUM + 1):
page_num = str(i)
url = x +'?&p='+ page_num
print(url)
driver.get(url)
review_containers = driver.find_elements_by_class_name('review-row')
for containers in review_containers:
total_rate.append(driver.find_element_by_class_name('css-i36p8g').text)
review_contents.append(containers.find_element_by_class_name('review-text').text)
product_helpful.append(containers.find_element_by_css_selector('[title="Helpful"] span').text)
product_not_helpful.append(containers.find_element_by_css_selector('[title="Unhelpful"] span').text)
stars = containers.find_elements_by_class_name("css-172co2l")
rating = 0
for star in stars:
star_color = star.find_element_by_tag_name("path").get_attribute("fill")
#print(star_color)
if star_color != "transparent":
rating += 1
member_rating.append(rating)
time.sleep(5) #slow the script down
driver.close()
您可以尝试driver.quit()
。这将关闭所有使用Selenium打开的浏览器。.close()
关闭了一个用Selenium打开的浏览器。两者仍然有效,但如果后者无效,则尝试前者。
有关更多详细信息,您可以查看此链接
导入:
import os
删除";browser.close(("并添加
os.system("taskkill /im chromedriver.exe")