无法通过selenium python打开网页



我是硒python的新手,我正在尝试从网站上抓取数据。下面是代码,我已经采取了所有必要的预防措施,以免被阻止。

from random import randrange
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
#Function to generate random useragent.
def generate_user_agent():
user_agents_file = open("user_agents.txt", "r")
user_agents = user_agents_file.read().split("n")
i = randrange(len(user_agents))
userAgent = user_agents[i]
user_agents_file.close()
return userAgent
#Function to generate random IP address.
def generate_ip_address():
proxies_file = open("proxyscrape_premium_http_proxies.txt", "r")
proxies = proxies_file.read().split("n")
i = randrange(len(proxies))
proxy = proxies[i]
proxies_file.close()
return proxy
#Function to create and set chrome options.
def set_chrome_options():
proxy = generate_ip_address()
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--incognito")
options.add_argument(f'--proxy-server={proxy}')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
return options, proxy
#Function to create a webdriver object and set its properties.
def create_webdriver():
options, proxy = set_chrome_options()
userAgent = generate_user_agent()
webdriver.DesiredCapabilities.CHROME['proxy'] = {
"httpProxy": proxy,
"ftpProxy": proxy,
"sslProxy": proxy,
"proxyType": "MANUAL",}
webdriver.DesiredCapabilities.CHROME['acceptSslCerts']=True
driver = webdriver.Chrome(options=options, executable_path=r'chromedriver.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": userAgent})
return driver
url = 'http://www.doctolib.de/impfung-covid-19-corona/berlin'
driver = create_webdriver()
driver.get(url)

该网页不是通过硒网络驱动程序打开的(但可以正常打开(。下面是我运行代码时如何打开浏览器的屏幕截图。

如果我遗漏了什么,请告诉我。如有任何帮助,将不胜感激

PS:我正在使用高级代理进行IP轮换。

浏览器输出

我过去也有过类似的经历,即使在使用了IP轮换、用户代理轮换或使用代理等多种方法后,网站也会检测到硒的使用。

我建议您使用未检测到的chromedriver库。

pip install undetected-chromedriver

它能够毫无问题地加载网站。代码片段如下:-

import undetected_chromedriver.v2 as uc
driver = uc.Chrome()
with driver:
driver.get('http://www.doctolib.de/impfung-covid-19-corona/berlin')

我在Linux上使用Firefox时遇到了类似的问题。我刚刚删除了日志文件,对于geckodriver创建的文本文件(4.8 mb(来说,日志文件相当大,一切都开始正常工作

最新更新