使用带有Selenium的代理不能得到正确的结果



我有这个正常工作的函数,但没有代理。它包含了我从网站中提取它时需要的HTML内容:

def extract_listing_html(url):

driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(service = Service(driver_path))
driver.get(url)  
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

return soup

我想使用代理,这就是我到目前为止所拥有的,但我没有得到与我不使用代理时相同的结果:

def extract_listing_html(url):

PROXY = "164.155.145.1:80" 
driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
chrome_options = Options()  
chrome_options.add_argument('--proxy-server=%s' "http://" +PROXY)
driver = webdriver.Chrome(service = Service(driver_path), options = chrome_options)
driver.get(url)  
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

return soup

我玩了一下,发现在webdriver.Chrome( )中添加options = chrome_options是导致它而不是返回相同的HTML的原因,但我不确定。

HTML Without Proxy

HTML With Proxy

他们看起来很不一样,不确定是什么原因。

进口:

import time 
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

使用用户代理已经给了我结果。

使用pip install pyyaml ua-parser user-agents fake-useragent安装fake_useragent

from fake_useragent import UserAgent
def extract_listing_html(url):

opts = Options()
ua = UserAgent()
userAgent = ua.random
print(userAgent)
opts.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(service = Service(driver_path), options = opts)
driver.get(url)  
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

return soup

最新更新