如何使用python selenium获取浏览器网络日志



我正在尝试使用selenium来调试请求/响应,以获取浏览器网络日志。你能帮我想办法吗。

我使用的是selenium 3.14.0和最新的Chrome浏览器。

使用python+硒+firefox

除非必须,否则不要设置代理-为了获得出站API请求,我使用了这个答案中的解决方案,但在python中:https://stackoverflow.com/a/45859018/14244758

test = driver.execute_script("var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;")
for item in test:
print(item)

你会得到一系列的格言。

这使我可以查看所有的网络请求。我使用它来解析其中一个请求中的一个参数,这样我就可以使用它来针对API发出自己的请求。

使用python+硒+Chrome

编辑:这个答案引起了很多关注,以下是我现在如何使用Chrome(取自未检测到的chromedriver代码(:

chrome_options = webdriver.ChromeOptions()
chrome_options.set_capability(
"goog:loggingPrefs", {"performance": "ALL", "browser": "ALL"}
)
driver = webdriver.Chrome(options=chrome_options)

##visit your website, login, etc. then:
log_entries = driver.get_log("performance")
for entry in log_entries:
try:
obj_serialized: str = entry.get("message")
obj = json.loads(obj_serialized)
message = obj.get("message")
method = message.get("method")
if method in ['Network.requestWillBeSentExtraInfo' or 'Network.requestWillBeSent']:
try:
for c in message['params']['associatedCookies']:
if c['cookie']['name'] == 'authToken':
bearer_token = c['cookie']['value']
except:
pass
print(type(message), method)
print('--------------------------------------')
except Exception as e:
raise e from None

使用这种方法,您可以解析出浏览器发送到服务器的令牌、api密钥等。

使用Python和ChromeDriver

要获取网络日志,您需要在python 中安装BrowserMobProxy以及selenium

pip install browsermob-proxy

然后我们需要从https://bmp.lightbody.net/.

将其解压缩到任何文件夹(例如路径/to/extracted_folder(。此文件夹包含browsermob代理二进制文件。在python代码中调用Server((时,我们需要提及此路径

您需要启动浏览器代理并在chrome驱动程序的chrome选项中配置代理,

from browsermobproxy import Server
from selenium import webdriver
server = Server("path/to/extracted_folder/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()
# Configure the browser proxy in chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))
browser = webdriver.Chrome(chrome_options = chrome_options)
#tag the har(network logs) with a name
proxy.new_har("google")

然后您可以使用硒导航到页面

browser.get("http://www.google.co.in")

导航后,您可以从代理获得json格式的网络日志

print(proxy.har) # returns a Network logs (HAR) as JSON 

同样在退出驱动程序之前,停止代理服务器也在最后,

server.stop()
browser.quit()

试试selenium-wire,我认为这是一种更好的方法,它还提供了针对机器人检测的undetected-chromedriver

我使用的是selenium 4.11,但以下内容可能会有所帮助。

import json
from selenium import webdriver
# Initialize Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
driver = webdriver.Chrome(options=chrome_options)
# Navigate to the target website
driver.get("https://your-website.com")
# Capture network log entries
log_entries = driver.get_log("performance")
# Initialize variables to store the last known URL
last_known_url = None
# Initialize lists to store request and response headers
request_headers_data = []
response_headers_data = []
for entry in log_entries:
try:
obj_serialized = entry.get("message")
obj = json.loads(obj_serialized)
message = obj.get("message")
method = message.get("method")
url = message.get("params", {}).get("documentURL")
# Update last known URL if available
if url:
last_known_url = url
if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
try:
request_payload = message['params'].get('request', {})
request_headers = request_payload.get('headers', {})
# Store request headers and last known URL in request_headers_data
request_headers_data.append({"url": last_known_url, "headers": request_headers})
except KeyError:
pass
if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
try:
response_payload = message['params'].get('response', {})
response_headers = response_payload.get('headers', {})
# Store response headers and last known URL in response_headers_data
response_headers_data.append({"url": last_known_url, "headers": response_headers})
except KeyError:
pass
if method == 'Network.loadingFinished':
# Network request is finished, you can now access request_headers_data and response_headers_data
print("Request Headers:")
for request_data in request_headers_data:
print("URL:", request_data["url"])
print(request_data["headers"])
print("Response Headers:")
for response_data in response_headers_data:
print("URL:", response_data["url"])
print(response_data["headers"])
print('--------------------------------------')
except Exception as e:
raise e from None
# Close the WebDriver
driver.quit()

或者,如果您希望按顺序。所以每个请求后面都有它的响应。

import json
from selenium import webdriver
# Initialize Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
driver = webdriver.Chrome(options=chrome_options)
# Navigate to the target website
driver.get("https://your-website.com")
# Capture network log entries
log_entries = driver.get_log("performance")
# Initialize dictionaries to store request and response headers
request_headers_data = []
response_headers_data = []
last_known_url = None  # To keep track of the URL associated with the latest entry
for entry in log_entries:
try:
obj_serialized = entry.get("message")
obj = json.loads(obj_serialized)
message = obj.get("message")
method = message.get("method")
url = message.get("params", {}).get("documentURL")
if method == 'Network.requestWillBeSentExtraInfo' or method == 'Network.requestWillBeSent':
try:
request_payload = message['params'].get('request', {})
request_headers = request_payload.get('headers', {})
# Store request headers and last known URL in request_headers_data
request_headers_data.append({"url": url, "headers": request_headers})
last_known_url = url
except KeyError:
pass
if method == 'Network.responseReceivedExtraInfo' or method == 'Network.responseReceived':
try:
response_payload = message['params'].get('response', {})
response_headers = response_payload.get('headers', {})
# Store response headers and last known URL in response_headers_data
response_headers_data.append({"url": url, "headers": response_headers})
last_known_url = url
except KeyError:
pass
except Exception as e:
raise e from None
# Iterate through the headers sequentially
for request_headers, response_headers in zip(request_headers_data, response_headers_data):
print("Request URL:", request_headers["url"])
print("Request Headers:", request_headers["headers"])
print("Response URL:", response_headers["url"])
print("Response Headers:", response_headers["headers"])
print('--------------------------------------')
# Close the WebDriver
driver.quit()

对于最新的python selenium 4.1.0版本,webdriver.get_log(self,log_type(只有4个类型日志

driver.get_log('browser')
driver.get_log('driver')
driver.get_log('client')
driver.get_log('server')

无法通过driver.get_log函数获取性能日志

要在页面加载完成之前只获取网络日志(在页面的主要使用过程中没有ajax/async网络日志(,您可以获取性能日志:http://chromedriver.chromium.org/logging/performance-log

启用ChromeDriver的性能日志,例如

DesiredCapabilities cap = DesiredCapabilities.chrome();
LoggingPreferences logPrefs = new LoggingPreferences();
logPrefs.enable(LogType.PERFORMANCE, Level.ALL);
cap.setCapability(CapabilityType.LOGGING_PREFS, logPrefs);
RemoteWebDriver driver = new RemoteWebDriver(new URL("http://127.0.0.1:9515"), cap);

chromium性能日志页面还链接到此完整示例https://gist.github.com/klepikov/5457750它有Java和python代码来获取性能日志。

同样,重要的是要记住,这只会得到网络请求,直到页面加载完成。之后,驱动程序将只返回相同的性能日志,直到页面重新加载为止。


如果您想在页面的整个使用过程中异步获取网络日志,您可以使用BrowserMobProxy作为Selenium驱动程序的代理服务器,并捕获所有这些网络请求。然后,您可以从BrowserMobProxy生成的HAR文件中获取这些捕获的请求:https://github.com/lightbody/browsermob-proxy#using-含硒

// start the proxy
BrowserMobProxy proxy = new BrowserMobProxyServer();
proxy.start(0);
// get the Selenium proxy object
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);
// configure it as a desired capability
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy);
// start the browser up
WebDriver driver = new FirefoxDriver(capabilities);
// enable more detailed HAR capture, if desired (see CaptureType for the complete list)
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT);
// create a new HAR with the label "yahoo.com"
proxy.newHar("yahoo.com");
// open yahoo.com
driver.get("http://yahoo.com");
// get the HAR data
Har har = proxy.getHar();

一旦您有了HAR文件,它就是一个类似JSON的网络事件列表,您可以使用它。

相关内容

  • 没有找到相关文章

最新更新