如何在spark JOB中使用EMR无服务器传递参数(entrypoinarguments) ? &g



**我试图通过boto3 (emr-无服务器客户端)entrypointararguments的参数传递一些参数来运行我的pyspark脚本,但是,它根本不起作用,我想知道我是否以正确的方式做。* *

**my python code is like this:**
`
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-env', nargs='?', metavar='Environment', type=str,
help='String: Environment to run. Options: [dev, prd]',
choices=['dev', 'prd'],
required=True,
default="prd")
# Capture args
args = parser.parse_args()
env = args.env
print(f"HELLO WOLRD FROM {env}")`
**and my script that runs emr-serverless looks like this:**
jobDriver={
"sparkSubmit": {
"entryPoint": "s3://example-bucket-us-east-1-codes-prd/hello_world.py",
"entryPointArguments": ["-env prd"],
"sparkSubmitParameters": 
"--conf spark.executor.cores=2 
--conf spark.executor.memory=4g 
--conf spark.driver.cores=2 
--conf spark.driver.memory=8g 
--conf spark.executor.instances=1 
--conf spark.dynamicAllocation.maxExecutors=12 
",
}
**I've already tried putting single quotes, double quotes, I've tried to pass along these parameters in the "sparkSubmitParameters" and so far, nothing works, there aren't many examples of how to do this on the internet, so my hope is that someone has already done it, and achieved, thank you!**

我对它进行了测试,并最终弄清楚了如何做到这一点。据我所知,当它是这样的参数时:

-env prd

你必须像这样传入entrypoinarguments

["-env", "prd"]

分隔参数,然后分别传递值。

要将一些参数传递到应用程序中,应该在命令的sparkSubmit部分指定一个名为entrypointargations的配置。

下面我为EMR无服务器应用程序粘贴了一个完整的AWS CLI命令来运行作业,将命名参数传递到包含pySpark代码的python脚本中。Spark Submit部分命令中的附加参数允许将包(utilities.zip)和jar文件(JDBC_Driver.jar)传递给Spark执行器,以便允许应用程序使用它。——execute -role-arn值应该来自IAM,——application-id是EMR Serverless应用程序(必须事先创建),它将运行作业。

aws emr-serverless start-job-run --execution-role-arn arn:aws:iam::123456:role/RoleName 
--application-id 1234567 --job-driver 
'{
"sparkSubmit": {
"entryPoint": "s3://MyS3Bucket/dir/pyspark/spark_app.py",
"entryPointArguments": [
"--s3",
"MyS3Bucket",
"--prefix",
"dir/pyspark",
"--env",
"dev"
],
"sparkSubmitParameters": "--conf spark.submit.pyFiles=s3://MyS3Bucket/dir/pyspark/utilities.zip, --jars s3://MyS3Bucket/dir/drivers/JDBC_Driver.jar"
}
}' 
--configuration-overrides 
'{
"monitoringConfiguration": {
"s3MonitoringConfiguration": {
"logUri": "s3://MyS3Bucket/logs/"
}
}
}'

最新更新