在抓取网站的第2页时,我如何解决这个硒陈旧的引用错误



我正在使用Selenium来抓取Linkedin中的作业,但我收到了过时的引用错误。

我试过刷新,等等,webdriverwait,一个try-catch块。

它总是在第2页失败。

我知道这可能是一个DOM问题,我已经找到了一些答案,但似乎没有一个对我有效

def scroll_to(self, job_list_item):
"""Just a function that will scroll to the list item in the column 
"""
self.driver.execute_script("arguments[0].scrollIntoView();", job_list_item)
job_list_item.click()
time.sleep(self.delay)

def get_position_data(self, job):
"""Gets the position data for a posting.
Parameters
----------
job : Selenium webelement
Returns
-------
list of strings : [position, company, location, details]
"""
# This is where the error is!
[position, company, location] = job.text.split('n')[:3]
details = self.driver.find_element_by_id("job-details").text
return [position, company, location, details]
def wait_for_element_ready(self, by, text):
try:
WebDriverWait(self.driver, self.delay).until(EC.presence_of_element_located((by, text)))
except TimeoutException:
logging.debug("wait_for_element_ready TimeoutException")
pass
logging.info("Begin linkedin keyword search")
self.search_linkedin(keywords, location)
self.wait()
# scrape pages,only do first 8 pages since after that the data isn't 
# well suited for me anyways:  

for page in range(2, 3):
jobs = self.driver.find_elements_by_class_name("occludable-update")
#jobs = self.driver.find_elements_by_css_selector(".occludable-update.ember-view")
#WebDriverWait(self.driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'occludable-update')))
for job in jobs:
self.scroll_to(job)
#job.click
[position, company, location, details] = self.get_position_data(job)
# do something with the data...
data = (position, company, location, details)
#logging.info(f"Added to DB: {position}, {company}, {location}")
writer.writerow(data)
# go to next page:
bot.driver.find_element_by_xpath(f"//button[@aria-label='Page {page}']").click()
bot.wait()
logging.info("Done scraping.")
logging.info("Closing DB connection.")
f.close()
bot.close_session()

我希望在执行job_list_item.click()时加载页面,在这种情况下,因为您正在循环jobs,它是WebDriverElement的列表,将变得过时。您正在返回页面,但您的jobs已经过时。

通常,为了防止过时的元素,我总是防止在循环中使用该元素或将元素存储到变量中,尤其是在元素可能更改的情况下。

最新更新