云形成堆栈删除无法删除 VPC



我通过 CLOUDFORMATION 创建了包含集合 EC2、Redshift、VPC 等的 aws 基础设施。现在我想以特别相反的顺序删除它。出。所有资源都依赖于 VPC。最后应删除 VPC。但不知何故,每个堆栈都在删除,但 VPC 堆栈不会通过 python 删除,BOTO3.It 显示一些子网或网络接口依赖项错误。但是当我尝试通过控制台删除时,它会成功删除它。 有人遇到过这个问题吗?

我试图删除所有附加的负载均衡器。但 VPC 仍然没有删除。

AWS CloudFormation 根据模板中的DependsOn引用和资源之间的引用创建资源之间的依赖关系图。

然后,它会尝试并行部署资源,但会考虑依赖项。

例如,子网可以定义为:

Subnet1:
Type: AWS::EC2::Subnet
Properties:
CidrBlock: 10.0.0.0/24
VpcId: !Ref ProdVPC

在这种情况下,有对ProdVPC的显式引用,因此 CloudFormation 只会在创建ProdVPC后创建Subnet1

删除CloudFormation 堆栈时,将应用反向逻辑。在这种情况下,Subnet1将在删除ProdVPC之前被删除。

但是,CloudFormation 不知道在堆栈之外创建的资源。这意味着,如果在子网内创建了资源(例如 Amazon EC2 实例),则堆栈删除将失败,因为当有 EC2 实例使用它时无法删除子网(或者更准确地说,将 ENI 附加到子网)。

在这种情况下,您需要手动删除导致"删除失败">的资源,然后再次尝试删除命令。

查找此类资源的一个好方法是查看 EC2 管理控制台的网络接口部分。确保没有接口连接到 VPC。

由于您指定在包含 lambda 的堆栈中删除 VPC 时遇到问题,而 lambda 本身位于 VPC 中,这很可能是因为 lambda 生成的网络接口用于连接到 VPC 中的其他资源。

从技术上讲,当从堆栈中取消部署 lambda 时,这些网络接口应该自动删除,但根据我的经验,我观察到孤立的 ENI 不允许取消部署 VPC。

出于这个原因,我创建了一个自定义资源支持的 lambda,用于在 VPC 中的所有 lambda 都已取消部署后清理 ENI。

这是云形成部分,您可以在其中设置自定义资源并传递 VPC ID

##############################################
#                                            #
#  Custom resource deleting net interfaces   #
#                                            #
##############################################
NetInterfacesCleanupFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src
Handler: cleanup/network_interfaces.handler
Role: !GetAtt BasicLambdaRole.Arn
DeploymentPreference:
Type: AllAtOnce
Timeout: 900
PermissionForNewInterfacesCleanupLambda:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:invokeFunction
FunctionName:
Fn::GetAtt: [ NetInterfacesCleanupFunction, Arn ]
Principal: lambda.amazonaws.com
InvokeLambdaFunctionToCleanupNetInterfaces:
DependsOn: [PermissionForNewInterfacesCleanupLambda]
Type: Custom::CleanupNetInterfacesLambda
Properties:
ServiceToken: !GetAtt NetInterfacesCleanupFunction.Arn
StackName: !Ref AWS::StackName
VPCID:
Fn::ImportValue: !Sub '${MasterStack}-Articles-VPC-Ref'
Tags:
'owner': !Ref StackOwner
'task': !Ref Task

这是相应的λ。此 lambda 尝试 3 次分离和删除孤立的网络接口,如果失败,则失败,这意味着仍然有一个 lambda 正在生成新的网络接口,您需要为此进行调试。

import boto3
from botocore.exceptions import ClientError
from time import sleep
# Fix this wherever your custom resource handler code is
from common import cfn_custom_resources as csr
import sys
MAX_RETRIES = 3
client = boto3.client('ec2')

def handler(event, context):
vpc_id = event['ResourceProperties']['VPCID']
if not csr.__is_valid_event(event, context):
csr.send(event, context, FAILED, validate_response_data(result))
return
elif event['RequestType'] == 'Create' or event['RequestType'] == 'Update':
result = {'result': 'Don't trigger the rest of the code'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
return
try:
# Get all network intefaces for given vpc which are attached to a lambda function
interfaces = client.describe_network_interfaces(
Filters=[
{
'Name': 'description',
'Values': ['AWS Lambda VPC ENI*']
},
{
'Name': 'vpc-id',
'Values': [vpc_id]
},
],
)
failed_detach = list()
failed_delete = list()
# Detach the above found network interfaces
for interface in interfaces['NetworkInterfaces']:
detach_interface(failed_detach, interface)
# Try detach a second time and delete each simultaneously
for interface in interfaces['NetworkInterfaces']:
detach_and_delete_interface(failed_detach, failed_delete, interface)
if not failed_detach or not failed_delete:
result = {'result': 'Network interfaces detached and deleted successfully'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
else:
result = {'result': 'Network interfaces couldn't be deleted completely'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
# print(response)
except Exception:
print("Unexpected error:", sys.exc_info())
result = {'result': 'Some error with the process of detaching and deleting the network interfaces'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))

def detach_interface(failed_detach, interface):
try:
if interface['Status'] == 'in-use':
detach_response = client.detach_network_interface(
AttachmentId=interface['Attachment']['AttachmentId'],
Force=True
)
# Sleep for 1 sec after every detachment
sleep(1)
print(f"Detach response for {interface['NetworkInterfaceId']}- {detach_response}")
if 'HTTPStatusCode' not in detach_response['ResponseMetadata'] or 
detach_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_detach.append(detach_response)
except ClientError as e:
print(f"Exception details - {sys.exc_info()}")

def detach_and_delete_interface(failed_detach, failed_delete, interface, retries=0):
detach_interface(failed_detach, interface)
sleep(retries + 1)
try:
delete_response = client.delete_network_interface(
NetworkInterfaceId=interface['NetworkInterfaceId'])
print(f"Delete response for {interface['NetworkInterfaceId']}- {delete_response}")
if 'HTTPStatusCode' not in delete_response['ResponseMetadata'] or 
delete_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_delete.append(delete_response)
except ClientError as e:
print(f"Exception while deleting - {str(e)}")
print()
if retries <= MAX_RETRIES:
if e.response['Error']['Code'] == 'InvalidNetworkInterface.InUse' or 
e.response['Error']['Code'] == 'InvalidParameterValue':
retries = retries + 1
print(f"Retry {retries} : Interface in use, deletion failed, retrying to detach and delete")
detach_and_delete_interface(failed_detach, failed_delete, interface, retries)
else:
raise RuntimeError("Code not found in error")
else:
raise RuntimeError("Max Number of retries exhausted to remove the interface")

指向 lambda 的链接 https://gist.github.com/revolutionisme/8ec785f8202f47da5517c295a28c7cb5

有关在 VPC 中配置 lambda 的更多信息 - https://docs.aws.amazon.com/lambda/latest/dg/vpc.html

最新更新