spark-submit在本地集群上运行良好,没有任何问题。由于资源限制,我转而使用基于云的计算。目前,我正在Google Cloud Dataproc中运行一个spark集群,其中我有1个master和4个worker。当我提交作业时,我会收到以下错误:
- 我的提交命令:
spark-submit --master yarn --deploy-mode cluster --class com.aavash.ann.sparkann.GraphNetworkSCL cleanSCL2.jar Oldenburg_Nodes.txt Oldenburg_Edges.txt Oldenburg_part_4.txt
- YARN日志文件中的更多详细信息:
2022-09-28 05:06:00,129 INFO client.RMProxy: Connecting to ResourceManager at spark-cluster-m/10.146.0.5:8032
2022-09-28 05:06:00,769 INFO client.AHSProxy: Connecting to Application History server at spark-cluster-m/10.146.0.5:10200
2022-09-28 05:06:04,078 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
End of LogType:prelaunch.err
******************************************************************************
Container: container_1664339426994_0003_02_000001 on spark-cluster-w-0.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:stderr
LogLastModifiedTime:Wed Sep 28 04:52:32 +0000 2022
LogLength:1179
LogContents:
22/09/28 04:52:32 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.ClassNotFoundException: com.aavash.ann.sparkann.GraphNetworkSCL
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:722)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:496)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
End of LogType:stderr
***********************************************************************
Container: container_1664339426994_0003_02_000001 on spark-cluster-w-0.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Sep 28 04:52:30 +0000 2022
LogLength:5110
LogContents:
ls -l:
total 32
lrwxrwxrwx 1 yarn yarn 77 Sep 28 04:52 __app__.jar -> /hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/15/cleanSCL2.jar
lrwxrwxrwx 1 yarn yarn 82 Sep 28 04:52 __spark_conf__ -> /hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/14/__spark_conf__.zip
-rw-r--r-- 1 yarn yarn 69 Sep 28 04:52 container_tokens
-rwx------ 1 yarn yarn 717 Sep 28 04:52 default_container_executor.sh
-rwx------ 1 yarn yarn 662 Sep 28 04:52 default_container_executor_session.sh
-rwx------ 1 yarn yarn 5140 Sep 28 04:52 launch_container.sh
drwx--x--- 2 yarn yarn 4096 Sep 28 04:52 tmp
find -L . -maxdepth 5 -ls:
4128911 4 drwx--x--- 3 yarn yarn 4096 Sep 28 04:52 .
4128919 4 -rwx------ 1 yarn yarn 717 Sep 28 04:52 ./default_container_executor.sh
4128920 4 -rw-r--r-- 1 yarn yarn 16 Sep 28 04:52 ./.default_container_executor.sh.crc
4128918 4 -rw-r--r-- 1 yarn yarn 16 Sep 28 04:52 ./.default_container_executor_session.sh.crc
4128916 4 -rw-r--r-- 1 yarn yarn 52 Sep 28 04:52 ./.launch_container.sh.crc
4128915 8 -rwx------ 1 yarn yarn 5140 Sep 28 04:52 ./launch_container.sh
4128914 4 -rw-r--r-- 1 yarn yarn 12 Sep 28 04:52 ./.container_tokens.crc
4128913 4 -rw-r--r-- 1 yarn yarn 69 Sep 28 04:52 ./container_tokens
4128903 292 -r-x------ 1 yarn yarn 297978 Sep 28 04:52 ./__app__.jar
4128873 4 drwx------ 3 yarn yarn 4096 Sep 28 04:52 ./__spark_conf__
4128899 148 -r-x------ 1 yarn yarn 150701 Sep 28 04:52 ./__spark_conf__/__spark_hadoop_conf__.xml
4128901 4 -r-x------ 1 yarn yarn 470 Sep 28 04:52 ./__spark_conf__/__spark_dist_cache__.properties
4128876 4 -r-x------ 1 yarn yarn 704 Sep 28 04:52 ./__spark_conf__/metrics.properties
4128877 4 drwx------ 2 yarn yarn 4096 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__
4128892 4 -r-x------ 1 yarn yarn 2163 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/mapred-env.sh
4128896 4 -r-x------ 1 yarn yarn 977 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/fairscheduler.xml
4128894 4 -r-x------ 1 yarn yarn 1535 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/distcp-default.xml
4128890 8 -r-x------ 1 yarn yarn 7522 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/yarn-env.sh
4128878 20 -r-x------ 1 yarn yarn 17233 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hadoop-env.sh
4128895 12 -r-x------ 1 yarn yarn 11392 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hadoop-policy.xml
4128888 4 -r-x------ 1 yarn yarn 1335 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/configuration.xsl
4128887 0 -r-x------ 1 yarn yarn 0 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/nodes_exclude
4128893 4 -r-x------ 1 yarn yarn 2316 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/ssl-client.xml.example
4128891 4 -r-x------ 1 yarn yarn 1940 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/container-executor.cfg
4128879 12 -r-x------ 1 yarn yarn 8338 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/mapred-site.xml
4128881 4 -r-x------ 1 yarn yarn 3321 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hadoop-metrics2.properties
4128880 16 -r-x------ 1 yarn yarn 14772 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/log4j.properties
4128884 8 -r-x------ 1 yarn yarn 4131 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/core-site.xml
4128882 4 -r-x------ 1 yarn yarn 82 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/yarn-timelineserver.logging.properties
4128897 4 -r-x------ 1 yarn yarn 2697 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/ssl-server.xml.example
4128886 0 -r-x------ 1 yarn yarn 0 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/nodes_include
4128889 8 -r-x------ 1 yarn yarn 7052 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hdfs-site.xml
4128898 8 -r-x------ 1 yarn yarn 4113 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/mapred-queues.xml.template
4128883 12 -r-x------ 1 yarn yarn 8291 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/yarn-site.xml
4128885 12 -r-x------ 1 yarn yarn 9533 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/capacity-scheduler.xml
4128875 4 -r-x------ 1 yarn yarn 1225 Sep 28 04:52 ./__spark_conf__/log4j.properties
4128900 4 -r-x------ 1 yarn yarn 1530 Sep 28 04:52 ./__spark_conf__/__spark_conf__.properties
4128917 4 -rwx------ 1 yarn yarn 662 Sep 28 04:52 ./default_container_executor_session.sh
4128912 4 drwx--x--- 2 yarn yarn 4096 Sep 28 04:52 ./tmp
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info
*******************************************************************************
End of LogType:stdout
***********************************************************************
Container: container_1664339426994_0003_02_000001 on spark-cluster-w-0.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:launch_container.sh
LogLastModifiedTime:Wed Sep 28 04:52:30 +0000 2022
LogLength:5140
LogContents:
#!/bin/bash
set -o pipefail -e
export PRELAUNCH_OUT="/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/prelaunch.out"
exec >"${PRELAUNCH_OUT}"
export PRELAUNCH_ERR="/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/prelaunch.err"
exec 2>"${PRELAUNCH_ERR}"
echo "Setting up env variables"
export PATH=${PATH:-"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin"}
export JAVA_HOME=${JAVA_HOME:-"/usr/lib/jvm/temurin-8-jdk-amd64"}
export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/usr/lib/hadoop"}
export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/usr/lib/hadoop-hdfs"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/lib/hadoop-yarn"}
export HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-mapreduce"}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-":/usr/lib/hadoop/lib/native"}
export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/appcache/application_1664339426994_0003/container_1664339426994_0003_02_000001/container_tokens"
export CONTAINER_ID="container_1664339426994_0003_02_000001"
export NM_PORT="8026"
export NM_HOST="spark-cluster-w-0.c.apache-spark-project-363713.internal"
export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/appcache/application_1664339426994_0003"
export LOCAL_USER_DIRS="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/"
export LOG_DIRS="/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001"
export USER="aavashbhandari"
export LOGNAME="aavashbhandari"
export HOME="/home/"
export PWD="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/appcache/application_1664339426994_0003/container_1664339426994_0003_02_000001"
export LOCALIZATION_COUNTERS="563687,0,2,0,125"
export JVM_PID="$$"
export NM_AUX_SERVICE_spark_shuffle=""
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
export SPARK_YARN_STAGING_DIR="hdfs://spark-cluster-m/user/aavashbhandari/.sparkStaging/application_1664339426994_0003"
export APP_SUBMIT_TIME_ENV="1664340741193"
export PYSPARK_PYTHON="/opt/conda/default/bin/python"
export PYTHONHASHSEED="0"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1664339426994_0003"
export SPARK_DIST_CLASSPATH=":/etc/hive/conf:/usr/local/share/google/dataproc/lib/*:/usr/share/java/mysql.jar"
export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:/usr/lib/spark/jars/*::/etc/hive/conf:/usr/local/share/google/dataproc/lib/*:/usr/share/java/mysql.jar:$PWD/__spark_conf__/__hadoop_conf__"
export SPARK_USER="aavashbhandari"
export MALLOC_ARENA_MAX="4"
echo "Setting up job resources"
ln -sf -- "/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/14/__spark_conf__.zip" "__spark_conf__"
ln -sf -- "/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/15/cleanSCL2.jar" "__app__.jar"
echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/launch_container.sh"
chmod 640 "/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/directory.info"
ls -l 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/directory.info"
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx2048m -Djava.io.tmpdir=$PWD/tmp -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.aavash.ann.sparkann.GraphNetworkSCL' --jar file:/home/aavashbhandari/cleanSCL2.jar --arg 'Oldenburg_Nodes.txt' --arg 'Oldenburg_Edges.txt' --arg 'Oldenburg_part_4.txt' --properties-file $PWD/__spark_conf__/__spark_conf__.properties --dist-cache-conf $PWD/__spark_conf__/__spark_dist_cache__.properties 1> /var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/stdout 2> /var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_02_000001/stderr"
End of LogType:launch_container.sh
************************************************************************************
Container: container_1664339426994_0003_02_000001 on spark-cluster-w-0.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Sep 28 04:52:30 +0000 2022
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out
******************************************************************************
End of LogType:stdout
***********************************************************************
Container: container_1664339426994_0003_01_000001 on spark-cluster-w-1.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Sep 28 04:52:25 +0000 2022
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out
******************************************************************************
Container: container_1664339426994_0003_01_000001 on spark-cluster-w-1.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:stderr
LogLastModifiedTime:Wed Sep 28 04:52:29 +0000 2022
LogLength:1179
LogContents:
22/09/28 04:52:29 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.ClassNotFoundException: com.aavash.ann.sparkann.GraphNetworkSCL
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:722)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:496)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
End of LogType:stderr
***********************************************************************
End of LogType:prelaunch.err
******************************************************************************
Container: container_1664339426994_0003_01_000001 on spark-cluster-w-1.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Sep 28 04:52:25 +0000 2022
LogLength:5110
LogContents:
ls -l:
total 32
lrwxrwxrwx 1 yarn yarn 77 Sep 28 04:52 __app__.jar -> /hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/11/cleanSCL2.jar
lrwxrwxrwx 1 yarn yarn 82 Sep 28 04:52 __spark_conf__ -> /hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/10/__spark_conf__.zip
-rw-r--r-- 1 yarn yarn 69 Sep 28 04:52 container_tokens
-rwx------ 1 yarn yarn 717 Sep 28 04:52 default_container_executor.sh
-rwx------ 1 yarn yarn 662 Sep 28 04:52 default_container_executor_session.sh
-rwx------ 1 yarn yarn 5141 Sep 28 04:52 launch_container.sh
drwx--x--- 2 yarn yarn 4096 Sep 28 04:52 tmp
find -L . -maxdepth 5 -ls:
4128841 4 drwx--x--- 3 yarn yarn 4096 Sep 28 04:52 .
4128849 4 -rwx------ 1 yarn yarn 717 Sep 28 04:52 ./default_container_executor.sh
4128850 4 -rw-r--r-- 1 yarn yarn 16 Sep 28 04:52 ./.default_container_executor.sh.crc
4128848 4 -rw-r--r-- 1 yarn yarn 16 Sep 28 04:52 ./.default_container_executor_session.sh.crc
4128846 4 -rw-r--r-- 1 yarn yarn 52 Sep 28 04:52 ./.launch_container.sh.crc
4128845 8 -rwx------ 1 yarn yarn 5141 Sep 28 04:52 ./launch_container.sh
4128844 4 -rw-r--r-- 1 yarn yarn 12 Sep 28 04:52 ./.container_tokens.crc
4128843 4 -rw-r--r-- 1 yarn yarn 69 Sep 28 04:52 ./container_tokens
4128833 292 -r-x------ 1 yarn yarn 297978 Sep 28 04:52 ./__app__.jar
4128803 4 drwx------ 3 yarn yarn 4096 Sep 28 04:52 ./__spark_conf__
4128829 148 -r-x------ 1 yarn yarn 150701 Sep 28 04:52 ./__spark_conf__/__spark_hadoop_conf__.xml
4128831 4 -r-x------ 1 yarn yarn 470 Sep 28 04:52 ./__spark_conf__/__spark_dist_cache__.properties
4128806 4 -r-x------ 1 yarn yarn 704 Sep 28 04:52 ./__spark_conf__/metrics.properties
4128807 4 drwx------ 2 yarn yarn 4096 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__
4128822 4 -r-x------ 1 yarn yarn 2163 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/mapred-env.sh
4128826 4 -r-x------ 1 yarn yarn 977 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/fairscheduler.xml
4128824 4 -r-x------ 1 yarn yarn 1535 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/distcp-default.xml
4128820 8 -r-x------ 1 yarn yarn 7522 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/yarn-env.sh
4128808 20 -r-x------ 1 yarn yarn 17233 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hadoop-env.sh
4128825 12 -r-x------ 1 yarn yarn 11392 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hadoop-policy.xml
4128818 4 -r-x------ 1 yarn yarn 1335 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/configuration.xsl
4128817 0 -r-x------ 1 yarn yarn 0 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/nodes_exclude
4128823 4 -r-x------ 1 yarn yarn 2316 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/ssl-client.xml.example
4128821 4 -r-x------ 1 yarn yarn 1940 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/container-executor.cfg
4128809 12 -r-x------ 1 yarn yarn 8338 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/mapred-site.xml
4128811 4 -r-x------ 1 yarn yarn 3321 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hadoop-metrics2.properties
4128810 16 -r-x------ 1 yarn yarn 14772 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/log4j.properties
4128814 8 -r-x------ 1 yarn yarn 4131 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/core-site.xml
4128812 4 -r-x------ 1 yarn yarn 82 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/yarn-timelineserver.logging.properties
4128827 4 -r-x------ 1 yarn yarn 2697 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/ssl-server.xml.example
4128816 0 -r-x------ 1 yarn yarn 0 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/nodes_include
4128819 8 -r-x------ 1 yarn yarn 7052 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/hdfs-site.xml
4128828 8 -r-x------ 1 yarn yarn 4113 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/mapred-queues.xml.template
4128813 12 -r-x------ 1 yarn yarn 8291 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/yarn-site.xml
4128815 12 -r-x------ 1 yarn yarn 9533 Sep 28 04:52 ./__spark_conf__/__hadoop_conf__/capacity-scheduler.xml
4128805 4 -r-x------ 1 yarn yarn 1225 Sep 28 04:52 ./__spark_conf__/log4j.properties
4128830 4 -r-x------ 1 yarn yarn 1530 Sep 28 04:52 ./__spark_conf__/__spark_conf__.properties
4128847 4 -rwx------ 1 yarn yarn 662 Sep 28 04:52 ./default_container_executor_session.sh
4128842 4 drwx--x--- 2 yarn yarn 4096 Sep 28 04:52 ./tmp
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info
*******************************************************************************
Container: container_1664339426994_0003_01_000001 on spark-cluster-w-1.c.apache-spark-project-363713.internal_8026
LogAggregationType: AGGREGATED
==================================================================================================================
LogType:launch_container.sh
LogLastModifiedTime:Wed Sep 28 04:52:25 +0000 2022
LogLength:5141
LogContents:
#!/bin/bash
set -o pipefail -e
export PRELAUNCH_OUT="/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/prelaunch.out"
exec >"${PRELAUNCH_OUT}"
export PRELAUNCH_ERR="/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/prelaunch.err"
exec 2>"${PRELAUNCH_ERR}"
echo "Setting up env variables"
export PATH=${PATH:-"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin"}
export JAVA_HOME=${JAVA_HOME:-"/usr/lib/jvm/temurin-8-jdk-amd64"}
export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/usr/lib/hadoop"}
export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/usr/lib/hadoop-hdfs"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/lib/hadoop-yarn"}
export HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-mapreduce"}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-":/usr/lib/hadoop/lib/native"}
export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/appcache/application_1664339426994_0003/container_1664339426994_0003_01_000001/container_tokens"
export CONTAINER_ID="container_1664339426994_0003_01_000001"
export NM_PORT="8026"
export NM_HOST="spark-cluster-w-1.c.apache-spark-project-363713.internal"
export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/appcache/application_1664339426994_0003"
export LOCAL_USER_DIRS="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/"
export LOG_DIRS="/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001"
export USER="aavashbhandari"
export LOGNAME="aavashbhandari"
export HOME="/home/"
export PWD="/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/appcache/application_1664339426994_0003/container_1664339426994_0003_01_000001"
export LOCALIZATION_COUNTERS="563687,0,2,0,1366"
export JVM_PID="$$"
export NM_AUX_SERVICE_spark_shuffle=""
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
export SPARK_YARN_STAGING_DIR="hdfs://spark-cluster-m/user/aavashbhandari/.sparkStaging/application_1664339426994_0003"
export APP_SUBMIT_TIME_ENV="1664340741193"
export PYSPARK_PYTHON="/opt/conda/default/bin/python"
export PYTHONHASHSEED="0"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1664339426994_0003"
export SPARK_DIST_CLASSPATH=":/etc/hive/conf:/usr/local/share/google/dataproc/lib/*:/usr/share/java/mysql.jar"
export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:/usr/lib/spark/jars/*::/etc/hive/conf:/usr/local/share/google/dataproc/lib/*:/usr/share/java/mysql.jar:$PWD/__spark_conf__/__hadoop_conf__"
export SPARK_USER="aavashbhandari"
export MALLOC_ARENA_MAX="4"
echo "Setting up job resources"
ln -sf -- "/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/11/cleanSCL2.jar" "__app__.jar"
ln -sf -- "/hadoop/yarn/nm-local-dir/usercache/aavashbhandari/filecache/10/__spark_conf__.zip" "__spark_conf__"
echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/launch_container.sh"
chmod 640 "/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/directory.info"
ls -l 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/directory.info"
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx2048m -Djava.io.tmpdir=$PWD/tmp -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.aavash.ann.sparkann.GraphNetworkSCL' --jar file:/home/aavashbhandari/cleanSCL2.jar --arg 'Oldenburg_Nodes.txt' --arg 'Oldenburg_Edges.txt' --arg 'Oldenburg_part_4.txt' --properties-file $PWD/__spark_conf__/__spark_conf__.properties --dist-cache-conf $PWD/__spark_conf__/__spark_dist_cache__.properties 1> /var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/stdout 2> /var/log/hadoop-yarn/userlogs/application_1664339426994_0003/container_1664339426994_0003_01_000001/stderr"
End of LogType:launch_container.sh
************************************************************************************
- 我的POM文件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.aavash.ann</groupId>
<artifactId>sparkann</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>SparkANN</name>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.14</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.7.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
在pom.xml
中用以下代码替换构建属性。它不会创建一个庞大的JAR。
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.aavash.ann</groupId>
<artifactId>sparkann</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>SparkANN</name>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.14</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<configuration>
<downloadSources>true</downloadSources>
<buildcommands>
<buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<additionalProjectnatures>
<projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
</additionalProjectnatures>
<classpathContainers>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
<classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
</classpathContainers>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
使用以下命令构建它:
mvn干净包