Spark错误:执行器XXX完成,状态为EXITED消息命令退出,代码为1 exitStatus 1



我在Oraclelinux上构建了独立的spark集群。我在Master:上的spark env.sh中添加了这一行

export SPARK_MASTER_HOST=x.x.x.x

并在Master和Worker中的spark env.sh中添加这些行:

export PYSPARK_PYTHON=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.8

此外,我将worker的IP插入主和worker的worker文件中。我以这种方式启动Spark Cluster:在主机中:

/opt/spark/sbin/start-master.sh

在职员工:

/opt/spark/sbin/start-worker.sh spark://x.x.x.x:7077

事实上,我有一个工人和一个师傅。我配置~/.bashrc如下:

export JAVA_HOME=/opt/oracle/java/jdk1.8.0_25
export PATH=$JAVA_HOME/bin:$PATH
alias python=/usr/bin/python3.8
export LD_LIBRARY_PATH=/opt/oracle/instantclient_21_4:$LD_LIBRARY_PATH
export PATH=/opt/oracle/instantclient_21_4:$PATH
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_HOME=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=python3.8
export PYSPARK_PYTHON=/usr/bin/python3.8    

当我运行spark-submit时,我没有任何错误,但命令永远运行,没有任何结果。我看到这行:

22/03/04 12:07:40 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks resource profile 0
22/03/04 12:07:41 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20220304120738-0000/0 is now EXITED (Command exited with code 1)
22/03/04 12:07:41 INFO StandaloneSchedulerBackend: Executor app-20220304120738-0000/0 removed: Command exited with code 1
22/03/04 12:07:41 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20220304120738-0000/3 on worker-20220304120443-192.9.200.68-42185 (192.9.200.68:42185) with 2 core(s)
22/03/04 12:07:41 INFO StandaloneSchedulerBackend: Granted executor ID app-20220304120738-0000/3 on hostPort 192.9.200.68:42185 with 2 core(s), 2.0 GiB RAM

我检查工人日志,我有这个错误:

22/03/04 12:07:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with m$
22/03/04 12:07:38 INFO ExecutorRunner: Launch command: "/opt/oracle/java/jdk1.8.0_25/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx2048M" "-Dspark.driver.port=40345" "-XX:+PrintGC$
22/03/04 12:07:38 INFO ExecutorRunner: Launch command: "/opt/oracle/java/jdk1.8.0_25/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx2048M" "-Dspark.driver.port=40345" "-XX:+PrintGC$
22/03/04 12:07:38 INFO ExecutorRunner: Launch command: "/opt/oracle/java/jdk1.8.0_25/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx2048M" "-Dspark.driver.port=40345" "-XX:+PrintGC$
22/03/04 12:07:41 INFO Worker: Executor app-20220304120738-0000/0 finished with state EXITED message Command exited with code 1 exitStatus 1
22/03/04 12:07:41 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 0
22/03/04 12:07:41 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220304120738-0000, execId=0)

火花提交是这样的:

/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files etl/sparkConfig.json --py-files etl/brn_utils.py,etl/cst.py,etl/cst_utils.py,etl/emp_utils.py,etl/general_utils.py,etl/grouping.py,etl/grp_state.py,etl/conn.py etl/main.py

我在root用户中进行测试,还创建了spark的用户,但没有任何更改。

你能告诉我怎么了吗?

谢谢。

问题已解决。

我想是因为网络问题。自从我将这一部分添加到spark-submit中以来,一切都很顺利。

--conf spark.driver.host=x.x.x.x

事实上,我运行这个:

/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --conf spark.driver.host=x.x.x.x --files etl/sparkConfig.json --py-files etl/brn_utils.py,etl/cst.py,etl/cst_utils.py,etl/emp_utils.py,etl/general_utils.py,etl/grouping.py,etl/grp_state.py,etl/conn.py etl/main.py

请注意在同一位置复制所有节点中的程序。此外,因为我是远程访问集群的,所以我使用SSH隧道在计算机中设置UI。像这样:

ssh spark@master_ip -N -L 4040:master_ip:8080

在上面的命令中,4040是我的计算机的端口,8080SH隧道之后,我可以在浏览器中编写Master_IP:8080来打开spark UI。

希望对你有所帮助。