我正在尝试用python selenium抓取下面给出的url。
https://www.rtilinks.com/?5b5483ba2d=OUhWbXlXOGY4cEE0VEtsK1pWSU5CdEJob0hiR0xFNjN2M252ZXlOWnp0RC9yaFpvN3ZNeW9SazlONWJSTWpvNGNpR0FwWUZwQWduaXdFY202bkcrUHAybkVDc0hMMk9EWFdweitsS0xHa0U9
这是我的代码
from pprint import pprint
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from PIL import Image
import requests
from time import sleep
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',options=chrome_options)
wd.get("https://www.rtilinks.com/?5b5483ba2d=OUhWbXlXOGY4cEE0VEtsK1pWSU5CdEJob0hiR0xFNjN2M252ZXlOWnp0RC9yaFpvN3ZNeW9SazlONWJSTWpvNGNpR0FwWUZwQWduaXdFY202bkcrUHAybkVDc0hMMk9EWFdweitsS0xHa0U9
")
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.ID, "soralink-human-verif-main"))).click()
sleep(10)
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//img[@id='showlink' and @x-onclick]"))).click()
在运行完这段代码后,我应该被重定向到https://rareapk.com/finance/?n1p0ei2ng5yd3gz但它停留在同一页。
下面给出了我正在单击的元素。
<img class="spoint" id="showlink" x-onclick="changeLink()" src="https://eductin.com/wp-content/uploads/2021/06/Download.png">
元素图像
我的代码在做什么
- 首先转到这个url
- 然后单击
I'M NOT A ROBOT
- 加载下一页之后,selenium将等待10秒
- 然后单击一个图像(具有文本
DOWNLOAD RTI
(,该图像应将其重定向到REDIRECTED URL
但在最后一步中,它停留在同一个url,它不重定向
我尝试了以下方法
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//img[@id='showlink' and @x-onclick]"))).click()
wd.find_element(By.ID, "showlink").click()
我测试了没有headless
的代码,我看到浏览器打开了预期的页面,但wd.current_url
仍然显示旧的URL(wd.title
也显示旧的标题(
所有的问题都可能是因为页面在新的tab
中打开了新的URL,并且需要使用wd.switch_to_window(...)
来访问其他tab
。
此代码使用switch_to_window(...)
,并在其他tab
中显示正确的URL(和标题(。
BTW:我不得不添加"Consent"
,因为我的浏览器有时会显示它。
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
from webdriver_manager.chrome import ChromeDriverManager, ChromeType
#from webdriver_manager.firefox import GeckoDriverManager
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
#wd = webdriver.Chrome('chromedriver', options=chrome_options)
wd = webdriver.Chrome(service=Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()), options=chrome_options)
#wd = webdriver.Firefox(service=Service(GeckoDriverManager().install()))
wd.get("https://www.rtilinks.com/?5b5483ba2d=OUhWbXlXOGY4cEE0VEtsK1pWSU5CdEJob0hiR0xFNjN2M252ZXlOWnp0RC9yaFpvN3ZNeW9SazlONWJSTWpvNGNpR0FwWUZwQWduaXdFY202bkcrUHAybkVDc0hMMk9EWFdweitsS0xHa0U9")
p = wd.current_window_handle
print('current_window_handle:', p)
try:
print('Waiting for: "Consent"')
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@aria-label='Consent']"))).click()
except Exception as ex:
print('Exception:', ex)
print('Waiting for: "I'm not a robot"')
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.ID, "soralink-human-verif-main"))).click()
print('Waiting for: "Download (RTI)"')
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//img[@id='showlink' and @x-onclick]"))).click()
print('--- active tab ---')
print('current_window_handle:', p)
print('current_url:', wd.current_url)
print('title:', wd.title)
print('--- other tabs ---')
chwd = wd.window_handles
for w in chwd:
#switch focus to child window
if w != p:
wd.switch_to.window(w)
print('current_window_handle:', w)
print('current_url:', wd.current_url)
print('title:', wd.title)
print('---')
wd.close()
结果:
Waiting for: "Consent"
Waiting for: "I'm not a robot"
Waiting for: "Download (RTI)"
--- active tab ---
current_window_handle: CDwindow-31FDEC2C62AA0666A8F3A1DD2133D02C
current_url: https://eductin.com/how-to-fix-and-restore-deleted-mac-system-files/
title: How to fix and Restore deleted Mac system files. – Eductin
--- other tabs ---
current_window_handle: CDwindow-CB1EAE5B6DCD4ACF5D061ED4ECC314CD
current_url: https://sakarnewz.com/
title: SakarNewz – BOOST YOUR KNOWLEDGE WITH TECH NEWS AND UPDATES
---
编辑:
有时,此代码在显示有关其他选项卡的信息时会遇到问题,因为选项卡似乎一直运行JavaScript,并且Selenium可能无法访问数据。