Spark-Submit命令需要一段时间才能运行



我们在IBM AIX (version 2)操作系统的服务器集群上安装了Apache Hadoop和Spark。

Hadoop版本- Hadoop -3.2.1Spark版本- Spark -3.0.1

我们通过运行Spark -submit——version来测试Spark的整体安装$SPARK_HOME/bin下的命令文件夹中。该命令间歇运行。当您第一次运行它时,该命令运行没有延迟。之后多次执行该命令需要很长时间(大约30 - 40分钟)。我们已经检查了服务器的CPU和内存-没有低内存或应用程序占用处理器资源的问题。当这个命令运行时,我们无法确定延迟在哪里。

这个Hadoop/Spark设置工作在一个运行Red Hat 7.9的集群中。在这种环境下,我们不会遇到这个问题。

这是我第一次问关于Stack Overflow的问题。如果还需要我提供什么信息,请告诉我。

提前感谢。

===========================5月11日编辑:

日志运行成功(在spark-submit命令中添加了调试行)

bash-5.0$ spark-submit --version
Entered spark submit
About to execute spark submit command.....
About to load spark env.sh
Loaded spark env.sh
Entered statement to create RUNNER
searching spark_home/jars
Loaded spark jars DIR
Launching class path
Launched class path
Entering build command
Completed build command
About to enter while block
Entered while block for Entered build command
Entered build command
CMD is
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for
For  changing delim to blank
CMD is
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for /u01/app/java8_64/bin/java
Entered if condition for /u01/app/java8_64/bin/java
CMD is /u01/app/java8_64/bin/java
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for -cp
Entered if condition for -cp
CMD is /u01/app/java8_64/bin/java -cp
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/
Entered if condition for /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for -Xmx1g
Entered if condition for -Xmx1g
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for org.apache.spark.deploy.SparkSubmit
Entered if condition for org.apache.spark.deploy.SparkSubmit
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g org.apache.spark.deploy.SparkSubmit
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for --version
Entered if condition for --version
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g org.apache.spark.deploy.SparkSubmit --version
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for 0
Entered if condition for 0
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g org.apache.spark.deploy.SparkSubmit --version 0
build_command is  and org.apache.spark.deploy.SparkSubmit --version
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g org.apache.spark.deploy.SparkSubmit --version 0
completed while block
About to execute /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g 
org.apache.spark.deploy.SparkSubmit --version
Welcome to
____              __
/ __/__  ___ _____/ /__
_ / _ / _ `/ __/  '_/
/___/ .__/_,_/_/ /_/_   version 3.0.1
/_/
Using Scala version 2.12.10, IBM J9 VM, 1.8.0_251
Branch HEAD
Compiled by user ubuntu on 2020-08-28T08:58:35Z
Revision 2b147c4cd50da32fe2b4167f97c8142102a0510d
Url https://gitbox.apache.org/repos/asf/spark.git
Type --help for more information.

=============================================================


失败运行:

bash-5.0$ spark-submit --version
Entered spark submit
About to execute spark submit command.....
About to load spark env.sh
Loaded spark env.sh
Entered statement to create RUNNER
searching spark_home/jars
Loaded spark jars DIR
Launching class path
Launched class path
Entering build command
Completed build command
About to enter while block
Entered while block for Entered build command
Entered build command
CMD is
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for
For  changing delim to blank
CMD is
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for /u01/app/java8_64/bin/java
Entered if condition for /u01/app/java8_64/bin/java
CMD is /u01/app/java8_64/bin/java
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for -cp
Entered if condition for -cp
CMD is /u01/app/java8_64/bin/java -cp
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/
Entered if condition for /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for -Xmx1g
Entered if condition for -Xmx1g
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for org.apache.spark.deploy.SparkSubmit
Entered if condition for org.apache.spark.deploy.SparkSubmit
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g org.apache.spark.deploy.SparkSubmit
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for --version
Entered if condition for --version
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g org.apache.spark.deploy.SparkSubmit --version
build_command is  and org.apache.spark.deploy.SparkSubmit --version
Entered while block for 0
Entered if condition for 0
CMD is /u01/app/java8_64/bin/java -cp /u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/conf/:/u01/app/rmb/ria/AnthemSpark/spark-3.0.1-bin-hadoop3.2/jars/*:/u01/app/rmb/ria/AnthemSpark/hadoop-3.2.1/etc/hadoop/ -Xmx1g org.apache.spark.deploy.SparkSubmit --version 0
build_command is  and org.apache.spark.deploy.SparkSubmit --version

##########################

编辑- 3月12日

这是运行以下命令truss -d时的最后几行。最后一个输出显示它进入"睡眠"状态。

0.9063:        lseek(0, 0, 1)                   Err#29 ESPIPE
0.9066:        fstatx(0, 0x0FFFFFFFFFFFE8F8, 176, 0) = 0
0.9068:        _sigaction(14, 0x0FFFFFFFFFFFE710, 0x0FFFFFFFFFFFE740) = 0
0.9071:        incinterval(0, 0x0FFFFFFFFFFFE640, 0x0FFFFFFFFFFFE660) = 0
0.9073:        kread(0, " o", 1)                = 1
0.9075:        kread(0, " r", 1)                = 1
0.9078:        kread(0, " g", 1)                = 1
0.9080:        kread(0, " .", 1)                = 1
0.9082:        kread(0, " a", 1)                = 1
0.9084:        kread(0, " p", 1)                = 1
0.9086:        kread(0, " a", 1)                = 1
0.9089:        kread(0, " c", 1)                = 1
0.9091:        kread(0, " h", 1)                = 1
0.9093:        kread(0, " e", 1)                = 1
0.9095:        kread(0, " .", 1)                = 1
0.9097:        kread(0, " s", 1)                = 1
0.9100:        kread(0, " p", 1)                = 1
0.9102:        kread(0, " a", 1)                = 1
0.9104:        kread(0, " r", 1)                = 1
0.9106:        kread(0, " k", 1)                = 1
0.9108:        kread(0, " .", 1)                = 1
0.9111:        kread(0, " d", 1)                = 1
0.9113:        kread(0, " e", 1)                = 1
0.9115:        kread(0, " p", 1)                = 1
0.9117:        kread(0, " l", 1)                = 1
0.9119:        kread(0, " o", 1)                = 1
0.9122:        kread(0, " y", 1)                = 1
0.9124:        kread(0, " .", 1)                = 1
0.9126:        kread(0, " S", 1)                = 1
0.9128:        kread(0, " p", 1)                = 1
0.9130:        kread(0, " a", 1)                = 1
0.9132:        kread(0, " r", 1)                = 1
0.9135:        kread(0, " k", 1)                = 1
0.9137:        kread(0, " S", 1)                = 1
0.9139:        kread(0, " u", 1)                = 1
0.9141:        kread(0, " b", 1)                = 1
0.9143:        kread(0, " m", 1)                = 1
0.9187:        kread(0, " i", 1)                = 1
0.9190:        kread(0, " t", 1)                = 1
0.9192:        kread(0, "", 1)                = 1
0.9195:        incinterval(0, 0x0FFFFFFFFFFFE5C0, 0x0FFFFFFFFFFFE5E0) = 0
0.9197:        _sigaction(14, 0x0FFFFFFFFFFFE690, 0x0FFFFFFFFFFFE6C0) = 0
0.9200:        kfcntl(1, F_GETFL, 0x0000000000000000) = 67110914
0.9204:        kfcntl(1, F_GETFL, 0x0000000000000000) = 67110914
0.9207:        kioctl(0, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
0.9211:        lseek(0, 0, 1)                   Err#29 ESPIPE
0.9214:        fstatx(0, 0x0FFFFFFFFFFFE8F8, 176, 0) = 0
0.9216:        _sigaction(14, 0x0FFFFFFFFFFFE710, 0x0FFFFFFFFFFFE740) = 0
0.9219:        incinterval(0, 0x0FFFFFFFFFFFE640, 0x0FFFFFFFFFFFE660) = 0
0.9222:        kread(0, " -", 1)                = 1
0.9224:        kread(0, " -", 1)                = 1
0.9227:        kread(0, " v", 1)                = 1
0.9229:        kread(0, " e", 1)                = 1
0.9231:        kread(0, " r", 1)                = 1
0.9234:        kread(0, " s", 1)                = 1
0.9236:        kread(0, " i", 1)                = 1
0.9238:        kread(0, " o", 1)                = 1
0.9241:        kread(0, " n", 1)                = 1
0.9243:        kread(0, "", 1)                = 1
0.9245:        incinterval(0, 0x0FFFFFFFFFFFE5C0, 0x0FFFFFFFFFFFE5E0) = 0
0.9248:        _sigaction(14, 0x0FFFFFFFFFFFE690, 0x0FFFFFFFFFFFE6C0) = 0
0.9251:        kfcntl(1, F_GETFL, 0x0000000000000000) = 67110914
0.9254:        kfcntl(1, F_GETFL, 0x0000000000000000) = 67110914
0.9257:        kioctl(0, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
0.9260:        lseek(0, 0, 1)                   Err#29 ESPIPE
0.9262:        fstatx(0, 0x0FFFFFFFFFFFE8F8, 176, 0) = 0
0.9265:        _sigaction(14, 0x0FFFFFFFFFFFE710, 0x0FFFFFFFFFFFE740) = 0
0.9268:        incinterval(0, 0x0FFFFFFFFFFFE640, 0x0FFFFFFFFFFFE660) = 0
0.9270:        kread(0, " 0", 1)                = 1
0.9273:        kread(0, "", 1)                = 1
0.9275:        incinterval(0, 0x0FFFFFFFFFFFE5C0, 0x0FFFFFFFFFFFE5E0) = 0
0.9278:        _sigaction(14, 0x0FFFFFFFFFFFE690, 0x0FFFFFFFFFFFE6C0) = 0
0.9281:        kfcntl(1, F_GETFL, 0x0000000000000000) = 67110914
0.9284:        kfcntl(1, F_GETFL, 0x0000000000000020) = 67110914
0.9287:        kioctl(0, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
0.9290:        lseek(0, 0, 1)                   Err#29 ESPIPE
0.9292:        fstatx(0, 0x0FFFFFFFFFFFE8F8, 176, 0) = 0
0.9295:        _sigaction(14, 0x0FFFFFFFFFFFE710, 0x0FFFFFFFFFFFE740) = 0
0.9297:        incinterval(0, 0x0FFFFFFFFFFFE640, 0x0FFFFFFFFFFFE660) = 0
2.9303:        kread(0, "t", 1) (sleeping...)

@LorinczyZsigmond的建议最终把我们引向了正确的方向。通过搜索最后一行——kread(0, "t", 1) (sleeping...)——我们找到了最新版本中的Bash read内置问题,在IBM AIX Linux Toolbox网站上,有人在讨论其他人在发布该版本时遇到了同样的问题。在升级到5.1.4.2版本后,这个问题消失了,我们能够完成脚本。