Python Selenium And BS



我一直在尝试从我点击的页面中获取一个元素,以便使用soup.find_all进入下一个页面。问题是它给了我第一页的元素。提前感谢。

代码:

driver = webdriver.Chrome("C:/Users/user/PycharmProjects/Graph/chromedriver.exe")
driver.get("https://blockchain.coinmarketcap.com/chain/bitcoin")
time.sleep(2)

next_page = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, next_page_css)))
time.sleep(0.1)
actions = ActionChains(driver)
actions.move_to_element(next_page).perform()
next_page.click()
time.sleep(0.1)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content, "html.parser")
stats = str(soup.find_all('tbody', {"class": "ant-table-tbody"}))
print(stats)
driver.quit()

网站:https://blockchain.coinmarketcap.com/chain/bitcoin

点击按钮的照片

虽然我不太确定你想从中得到什么,但我还是设法从page2中获取了数据。

据我所见,似乎有两个问题:1( 睡眠时间不足,无法加载页面2( 由于不工作,不得不更改CSS选择器

next_page = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".ant-pagination-item-2")))
time.sleep(2)
actions = ActionChains(driver)
actions.move_to_element(next_page).perform()
next_page.click()
time.sleep(2)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content, "html.parser")
stats = str(soup.find_all('tbody', {"class": "ant-table-tbody"}))
print(stats)

返回:

<tbody class="ant-table-tbody"><tr class="ant-table-row ant-table-row-level-0" data-row-key="0000000000000000000aae7c345c1b0acde7650e5448ca5dfc6c8de7d415ed22">

单击content = driver.page_source.encode('utf-8').strip()之前的next_page.click()后,必须增加延迟,才能使页面加载下一页数据
不建议使用硬编码睡眠
更好的方法是获取上一页上显示的数据,然后使用某种预期条件等待,直到上一页不再显示数据。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 30).until(EC.invisibility_of_element_located((By.XPATH,element_located_by_previous_value)))

最新更新