如何在使用Boto3删除SQS消息时防止连接超时



我有一系列由SQS队列事件触发器提供的AWS Lambda。然而,有时当我试图从队列中删除消息时,尝试会一次又一次地超时,直到我的Lambda超时为止。

我启用了调试日志记录,确认这是套接字超时,但除此之外,我没有得到任何进一步的细节。这似乎也是不规则的。起初,我认为这是Lambda预热问题,但在成功运行Lambda多次并在第一次部署时,我就看到了这个问题。

到目前为止我尝试过的:

  • 我认为使用Boto客户端与使用Boto资源可能是问题所在,但我在两种方法中都看到了相同的结果
  • 我已经调整了连接和读取超时,使其高于默认值,然而,连接只是在后台使用Boto重试逻辑重试
  • 我已经尝试降低连接超时,但这只意味着在lambda超时之前会有更多的重试
  • 我尝试过标准和FIFO队列类型,两者都有相同的问题

其他一些细节:

  • Python v3.8.5
  • Boto3 v1.16.1
  • 我的SQS设置设置为5秒延迟和120秒可见性超时
  • 我的lambda超时是120秒

我正在使用的代码段:

config = Config(connect_timeout=30, read_timeout=30, retries={'total_max_attempts': 1}, region_name='us-east-1')
sqs_client = boto3.client(service_name='sqs', config=config)
receiptHandle = event['Records'][0]['receiptHandle']
fromQueueName = eventSourceARN.split(':')[-1]
fromQueue = sqs_client.get_queue_url(QueueName=fromQueueName)
fromQueueUrl = sqs_client.get_queue_url(QueueName=fromQueueName)['QueueUrl']
messageDelete = sqs_client.delete_message(QueueUrl=fromQueueUrl, ReceiptHandle=receiptHandle)

我看到的DEBUG异常的和示例:

[DEBUG] 2020-10-29T21:27:28.32Z 3c60cac9-6d99-58c6-84c9-92dc581919fd retry needed, retryable exception caught:
Connect timeout on endpoint URL: "https://queue.amazonaws.com/" Traceback (most recent call last):
"/var/task/urllib3/connection.py", line 159, in _new_conn conn = connection.create_connection(
File "/var/task/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/var/task/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa) socket.timeout: timed out During handling of the above exception, another exception occurred: Traceback (most
recent call last):
File "/opt/python/botocore/httpsession.py", line 254, in send
urllib_response = conn.urlopen(
File "/var/task/urllib3/connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "/var/task/urllib3/util/retry.py", line 386, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/var/task/urllib3/packages/six.py", line 735, in reraise
raise value
File "/var/task/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/var/task/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/var/task/urllib3/connectionpool.py", line 978, in _validate_conn
conn.connect()
File "/var/task/urllib3/connection.py", line 309, in connect
conn = self._new_conn()
File "/var/task/urllib3/connection.py", line 164, in _new_conn
raise ConnectTimeoutError( urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f27b56b7460>, 'Connection
to queue.amazonaws.com timed out. (connect timeout=15)') During handling of the above exception, another
exception occurred: Traceback (most recent call last):
File "/opt/python/utils.py", line 79, in preflight_check
fromQueue = sqs_client.get_queue_url(QueueName=fromQueueName)
File "/opt/python/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/opt/python/botocore/client.py", line 662, in _make_api_call
http, parsed_response = self._make_request(
File "/opt/python/botocore/client.py", line 682, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/opt/python/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/opt/python/botocore/endpoint.py", line 136, in _send_request
while self._needs_retry(attempts, operation_model, request_dict,
File "/opt/python/botocore/endpoint.py", line 253, in _needs_retry
responses = self._event_emitter.emit(
File "/opt/python/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/opt/python/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/opt/python/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/opt/python/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File "/opt/python/botocore/retryhandler.py", line 250, in __call__
should_retry = self._should_retry(attempt_number, response,
File "/opt/python/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/opt/python/botocore/retryhandler.py", line 316, in __call__
checker_response = checker(attempt_number, response,
File "/opt/python/botocore/retryhandler.py", line 222, in __call__
return self._check_caught_exception(
File "/opt/python/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/opt/python/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/opt/python/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/opt/python/botocore/httpsession.py", line 287, in send
raise ConnectTimeoutError(endpoint_url=request.url, error=e) botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint
URL: "https://queue.amazonaws.com/" During handling of the above exception, another exception occurred: Traceback (most recent
call last):
File "/opt/python/botocore/retryhandler.py", line 269, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/opt/python/botocore/retryhandler.py", line 316, in __call__
checker_response = checker(attempt_number, response,
File "/opt/python/botocore/retryhandler.py", line 222, in __call__
return self._check_caught_exception(
File "/opt/python/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/opt/python/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/opt/python/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/opt/python/botocore/httpsession.py", line 287, in send
raise ConnectTimeoutError(endpoint_url=request.url, error=e) botocore.exceptions.ConnectTimeoutError:
Connect timeout on endpoint URL: "https://queue.amazonaws.com/"

基于注释。

导致SQS超时的原因是lambda函数与VPC关联,并且VPC没有SQSVPC接口端点。如果没有端点或NAT网关,该功能将无法连接到SQS。

解决方案是为SQS服务添加VPC接口端点。

最新更新