Selenium无头浏览器在使用未检测到的chromedriver时会出现Cloudflare错误



我正试图使用Selenium从betonline.ag中抓取数据,当我切换到使用无头浏览器时(需要这样做才能最终将脚本移动到AWS lambda函数(,我收到以下错误:

File "C:Usersadeluanaconda3libsite-packagesseleniumwebdriverremoteerrorhandler.py", line 246, in check_response
raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
selenium.common.exceptions.UnexpectedAlertPresentException: Alert Text: There is no admin configuration!
Message: unexpected alert open: {Alert text : There is no admin configuration!}
(Session info: headless chrome=105.0.5195.127)
Stacktrace:
Backtrace:
Ordinal0 [0x006BDF13+2219795]
Ordinal0 [0x00652841+1779777]
Ordinal0 [0x0056423D+803389]
Ordinal0 [0x005BEFFE+1175550]
Ordinal0 [0x005AE616+1107478]
Ordinal0 [0x00587F89+950153]
Ordinal0 [0x00588F56+954198]
GetHandleVerifier [0x009B2CB2+3040210]
GetHandleVerifier [0x009A2BB4+2974420]
GetHandleVerifier [0x00756A0A+565546]
GetHandleVerifier [0x00755680+560544]
Ordinal0 [0x00659A5C+1808988]
Ordinal0 [0x0065E3A8+1827752]
Ordinal0 [0x0065E495+1827989]
Ordinal0 [0x006680A4+1867940]
BaseThreadInitThunk [0x77296739+25]
RtlGetFullPathName_UEx [0x77828FD2+1218]
RtlGetFullPathName_UEx [0x77828F9D+1165]

检查其中一个div,我可以看到这是一个CloudFlare问题:

driver.find_elements(By.TAG_NAME, 'div')[2].text

'Oops! It’s likely that we’re havingnan internal systems issuenIf the problem persists, please contact 
customer support so we can help.nAccess deniednYou cannot access ultraplay.betonline.ag. Refresh the 
page or contact the site owner to request access.nRay ID: 74d6c9316f273b7cnTimestamp: 2022-09-20 
01:28:42 UTCnYour IP address: ##.###.##.###nRequested URL: ultraplay.betonline.ag/esports/early-
marketsnError reference number: 1020nServer ID: FL_106F56nUser-Agent: Mozilla/5.0 (Windows NT 10.0; 
Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5195.127 Safari/537.36nCloudflare Ray 
ID:74d6c9316f273b7cnPerformance & Security by CloudflarenHelp Center'

我目前正在使用未检测到的chromedriver,因为在此之前我收到了同样的错误,在我切换到未检测到驱动程序后,它解决了这个问题。一旦我尝试使用无头浏览器,它就开始给我这个错误,一直无法绕过它

这是我当前的代码:

from math import floor
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import undetected_chromedriver as uc
import os
import time
EARLY_MARKETS = 'https://ultraplay.betonline.ag/esports/early-markets'
options = webdriver.ChromeOptions() 
options.add_argument('--headless')
options.add_argument("start-maximized")
options.add_argument("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/96.0.4664.110 Safari/537.36")
driver = uc.Chrome(options=options)
driver.get(EARLY_MARKETS)

在selenium webdriver的通用版本中,您需要添加用户代理作为参数的键,而不仅仅是传递参数字符串本身:

options.add_argument('--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"')

或者,您也可以使用dict作为标头,还可以添加除User Agent之外的更多字段。

header = {'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"}
driver.header_overrides = header

--start-maximized中可能也应该有双破折号。

最新更新