将粘附连接资源的值传递到 Python 作业



在我的AWS::Glue::Connection资源中,我已经设置了所有必需的凭据来访问我的SQL Server db。

GlueJDBCConnection:
Type: AWS::Glue::Connection
Properties:
CatalogId: !Ref AWS::AccountId
ConnectionInput:
ConnectionType: "JDBC"
ConnectionProperties:
USERNAME: !Ref Username
PASSWORD: !Ref Password
JDBC_CONNECTION_URL: !Ref GlueJDBCStringTarget
sslMode: 'REQUIRED'
PhysicalConnectionRequirements:
AvailabilityZone: !If [IsProd, !Ref AvailabilityZoneProd, !Ref AvailabilityZoneNonProd]
SecurityGroupIdList:
- Fn::GetAtt: GlueJobSecurityGroup.GroupId
SubnetId: !If [IsProd, !Ref PrivateSubnetAz2, !Ref PrivateSubnetAz3]
Name: !Ref JDBCConnectionName

我需要在我的Python脚本中使用USERNAMEPASSWORD,但我不希望它们暴露在AWS控制台的Job Parameters部分。它是否可以用其他方式来代替我下面所做的?

GlueJob:
Type: AWS::Glue::Job
DependsOn: GlueSecurityConfiguration
Properties:
Name: !Ref GlueJobName
Role: !Ref RoleForRTMI
SecurityConfiguration: !Ref SecurityConfiguration
Command:
Name: glueetl
PythonVersion: 3
ScriptLocation: !Sub 's3://xyz-${AWS::AccountId}-xx-xxxx-0/${blablabla}'
DefaultArguments:
'--USER': !Ref Username
'--PASS': !Ref Password
Connections:
Connections:
- Ref: GlueJDBCConnection
ExecutionProperty:
MaxConcurrentRuns: 2
#MaxCapacity: 2 #if used, don't use WorkerType and NumberOfWorkers
WorkerType: G.1X
NumberOfWorkers: 2
MaxRetries: 1
GlueVersion: '2.0'
Tags:
name: value_1

Python示例:

class FrameWriter:
def __init__(self, environment: str, context: GlueContext):
self.environment = environment
self.context = context

def write_frame(self, table_name: str, spark_df: DataFrame, rds_user: str, rds_pass: str):

rds_creds = glue_rds_cred(self.environment)
rds_user = rds_user
rds_pass = rds_pass
rds_url = dict_recursive_lookup("JDBC_CONNECTION_URL", rds_creds)
glue_df = DynamicFrame.fromDF(spark_df, self.context, "glue_df")
glue_table = table_name
self.context.write_dynamic_frame.from_options(
frame=glue_df,
connection_type = 'sqlserver',
connection_options = {"url": f"{rds_url}/db_name", "user": f"{rds_user}", "password": f"{rds_pass}", "dbtable": f"rdm.{glue_table}"},
transformation_ctx="output",
)
writer = FrameWriter(environment, glue_context)
writer.write_frame(name, sp_df, args["USER"], args["PASS"])

我想出了下面的代码,使用boto3提取用户和传递,这样我就不会在AWS的Glue控制台暴露它:

import boto3
def glue_rds_cred(environment) -> dict:
client_glue = boto3.client("glue")
response_rds_pass = client_glue.get_connection(
# CatalogId='string',
Name=f"instance_name-{environment}",
HidePassword=False,
)
return response_rds_pass

def dict_recursive_lookup(k: str, d: dict) -> str:
if k in d:
return d[k]
for v in d.values():
if isinstance(v, dict):
a = dict_recursive_lookup(k, v)
if a is not None:
return a
return None

最新更新