我目前正在研究一个非常大的Python代码库,它有很多副作用和意想不到的行为,我想通过查看它在整个执行过程中在调用堆栈中的任何点发出的所有出站HTTP请求来掌握它在做什么。是否有任何实用程序或集成路径允许我自动分析用Python编写的代码所做的完整网络调用集?
具体来说,与单独的外部工具相反,我希望能够与捕获的HTTP请求和响应进行交互,从配置模块或相邻模块内编程;例如:
- 使用该代码库中的现有日志处理程序记录请求/响应
- 将请求发布到事件代理,如Kafka 解析为pandas数据框架进行分析
- 集成到现有的unittest或pytest套件
我已经查看了提供的不同的可观察性工具。例如,Sentry似乎自动与Python的httplib
集成以创建一个"面包屑"。对于每个请求;但是,Sentry仅在抛出异常时记录此信息,并且其默认行为仅发布到其Web UI。New Relic还提供了查看"外部服务"的功能。调用,作为其应用程序性能监视产品的一部分,同样通过其自己的仪表板。然而,在这两种情况下,它们都缺乏官方支持的Python处理程序,该处理程序允许上述任务在生成出站网络请求的进程中发生。
我查看了Sentry的Python SDK源代码,看看他们是如何与http.client
集成的,并以一种一般化的方式调整了他们的方法来满足我的需求。
http.client.HTTPConnection
对象,以访问请求、请求体和响应对象。这个特殊示例中附加数据我想收集全球生活在分析模块的列表,以及记录相同的数据标准。您可以很容易地替换任何您想要的定制功能来代替对list.append
和logger.info
的调用:
import logging
import sys
from http.client import HTTPConnection
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter(fmt="%(name)s %(funcName)s %(levelname)s: %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
put_request_content = []
get_response_content = []
request_bodies = []
def decorate_HTTPConnection():
"""Taken loosely from https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/integrations/stdlib.py"""
global put_request_content, get_response_content, request_bodies
real_putrequest = HTTPConnection.putrequest
real_getresponse = HTTPConnection.getresponse
real__send_output = HTTPConnection._send_output
def new_putrequest(self, method, url, skip_host=False, skip_accept_encoding=False):
logger.info(f'{method}: {url}')
put_request_content.append((method, url))
real_putrequest(self, method, url, skip_host=skip_host, skip_accept_encoding=skip_accept_encoding)
def new_getresponse(self):
returned_response = real_getresponse(self)
logger.info(returned_response)
get_response_content.append(returned_response)
return returned_response
def new__send_output(self, message_body=None, encode_chunked=False):
logger.info(f'Message body: {message_body}')
request_bodies.append(message_body)
real__send_output(self, message_body=message_body, encode_chunked=encode_chunked)
HTTPConnection.putrequest = new_putrequest
HTTPConnection.getresponse = new_getresponse
HTTPConnection._send_output = new__send_output
decorate_HTTPConnection()
下面是我用来测试其行为的一个非常简单的脚本:
import logging
import sys
import requests
from http_profiler.connection_decorator import put_request_content, get_response_content, request_bodies
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter(fmt="%(name)s %(funcName)s %(levelname)s: %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
def test_profile_http_get_via_requests_library(url):
prev_len_put_request_content = len(put_request_content)
prev_len_get_repsonse_content = len(get_response_content)
prev_len_request_bodies = len(request_bodies)
logger.info(f"Starting the test: GET {url}")
resp = requests.get(url=url)
assert resp is not None
assert len(put_request_content) - prev_len_put_request_content == 1
assert len(get_response_content) - prev_len_get_repsonse_content == 1
assert len(request_bodies) - prev_len_request_bodies == 1
def test_profile_http_post_via_requests_library(url, data=None):
if data is None:
data = {"message": "Hello world!"}
prev_len_put_request_content = len(put_request_content)
prev_len_get_repsonse_content = len(get_response_content)
prev_len_request_bodies = len(request_bodies)
logger.info(f"Starting the test: POST {url} with {data}")
resp = requests.post(url=url, data=data)
assert resp is not None
assert len(put_request_content) - prev_len_put_request_content == 1
assert len(get_response_content) - prev_len_get_repsonse_content == 1
assert len(request_bodies) - prev_len_request_bodies == 1
if __name__ == "__main__":
test_profile_http_get_via_requests_library("https://example.com")
test_profile_http_post_via_requests_library("https://example.com")
logger.info(f'Requests: {put_request_content}')
logger.info(f'Request bodies: {request_bodies}')
logger.info(f'Responses: {[f"{response.status} {response.reason}" for response in get_response_content]}')
下面是测试脚本的输出:
__main__ test_profile_http_get_via_requests_library INFO: Starting the test: GET https://example.com
http_profiler.connection_decorator new_putrequest INFO: GET: /
http_profiler.connection_decorator new__send_output INFO: Message body: None
http_profiler.connection_decorator new_getresponse INFO: <http.client.HTTPResponse object at 0x7ff40aa5df10>
__main__ test_profile_http_post_via_requests_library INFO: Starting the test: POST https://example.com with {'message': 'Hello world!'}
http_profiler.connection_decorator new_putrequest INFO: POST: /
http_profiler.connection_decorator new__send_output INFO: Message body: b'message=Hello+world%21'
http_profiler.connection_decorator new_getresponse INFO: <http.client.HTTPResponse object at 0x7ff40aa5deb0>
__main__ <module> INFO: Requests: [('GET', '/'), ('POST', '/')]
__main__ <module> INFO: Request bodies: [None, b'message=Hello+world%21']
__main__ <module> INFO: Responses: ['200 OK', '200 OK']