我在AWS EMR系统上提交申请时收到以下错误。以客户端模式提交spark应用程序效果良好。请让我知道是否有任何其他配置需要做,以便在aws-emr的集群模式下工作。
[hadoop@ip-172-31-81-182 ~]$ spark-submit --master yarn --deploy-mode cluster --executor-memory 1G --num-executors 1 --driver-memory 1g --executor-cores 1 --conf spark.yarn.submit.waitAppCompletion=false --class WordCount.word.App /home/hadoop/word.jar s3n://bucket1/text.txt s3n://bucket1/output/ s3n://bucket1/analysis1/user.parquet
18/12/13 11:26:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/12/13 11:26:07 INFO RMProxy: Connecting to ResourceManager at ip-172-31-81-182.ec2.internal/172.31.81.182:8032
18/12/13 11:26:07 INFO Client: Requesting a new application from cluster with 1 NodeManagers
18/12/13 11:26:07 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6144 MB per container)
18/12/13 11:26:07 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
18/12/13 11:26:07 INFO Client: Setting up container launch context for our AM
18/12/13 11:26:07 INFO Client: Setting up the launch environment for our AM container
18/12/13 11:26:07 INFO Client: Preparing resources for our AM container
18/12/13 11:26:09 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/12/13 11:26:11 INFO Client: Uploading resource file:/mnt/tmp/spark-e01877e4-eb8c-4f6a-a6f5-c5c769c9c21e/__spark_libs__2272513134347036396.zip -> hdfs://ip-172-31-81-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1544697633631_0011/__spark_libs__2272513134347036396.zip
18/12/13 11:26:13 INFO Client: Uploading resource file:/home/hadoop/word.jar -> hdfs://ip-172-31-81-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1544697633631_0011/word.jar
18/12/13 11:26:15 INFO Client: Uploading resource file:/mnt/tmp/spark-e01877e4-eb8c-4f6a-a6f5-c5c769c9c21e/__spark_conf__8515846431603225843.zip -> hdfs://ip-172-31-81-182.ec2.internal:8020/user/hadoop/.sparkStaging/application_1544697633631_0011/__spark_conf__.zip
18/12/13 11:26:15 INFO SecurityManager: Changing view acls to: hadoop
18/12/13 11:26:15 INFO SecurityManager: Changing modify acls to: hadoop
18/12/13 11:26:15 INFO SecurityManager: Changing view acls groups to:
18/12/13 11:26:15 INFO SecurityManager: Changing modify acls groups to:
18/12/13 11:26:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/12/13 11:26:15 INFO Client: Submitting application application_1544697633631_0011 to ResourceManager
18/12/13 11:26:16 INFO YarnClientImpl: Submitted application application_1544697633631_0011
18/12/13 11:26:16 INFO Client: Application report for application_1544697633631_0011 (state: ACCEPTED)
18/12/13 11:26:16 INFO Client:
client token: N/A
diagnostics: [Thu Dec 13 11:26:16 +0000 2018] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:6144, vCores:4> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1544700376013
final status: UNDEFINED
tracking URL: http://ip-172-31-81-182.ec2.internal:20888/proxy/application_1544697633631_0011/
user: hadoop
18/12/13 11:26:16 INFO ShutdownHookManager: Shutdown hook called
18/12/13 11:26:16 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-e01877e4-eb8c-4f6a-a6f5-c5c769c9c21e
18/12/13 11:26:16 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-0762aaf6-577a-4ad7-a4a1-c4c16a590feb
[hadoop@ip-172-31-81-182 ~]$
要在纱线集群模式下执行火花作业时检查实际日志消息,请使用以下命令:
yarn logs -applicationId <application ID>
对于上述运行,将使用application_1544697633631_0011。