我试图从simple-yarn-app运行简单的纱线应用程序。但是我在我的应用程序错误日志中得到以下异常。
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/conf/YarnConfiguration
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
at java.lang.Class.getMethod0(Class.java:2774)
at java.lang.Class.getMethod(Class.java:1663)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.conf.YarnConfiguration
但是如果我在所有datanode上运行"yarn classpath"命令,我看到以下输出:
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*
包含应用程序所需的yarn-client、yarn-api、yarn-common和hadoop-common jar的路径。谁能告诉我,我可能忘记设置正确的类路径了吗?
我发现Hadoop在迭代YarnConfiguration属性时不解析$HADOOP_HOME和$YARN_HOME环境变量。在Yarn Client中运行以下命令将打印未解析的配置,如
$ HADOOP_HOME/$ HADOOP_HOME/lib/
YarnConfiguration conf = new YarnConfiguration()
for (String c : conf.getStrings(
YarnConfiguration.YARN_APPLICATION_CLASSPATH,
YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)) {
System.out.println(c);
}
所以,如果你为yarn.application.classpath属性提供了完整的路径,NoClassDefFoundError问题就会得到解决。
<property>
<description>CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries</description>
<name>yarn.application.classpath</name>
<value>
/etc/hadoop/conf,
/usr/lib/hadoop/*,
/usr/lib/hadoop/lib/*,
/usr/lib/hadoop-hdfs/*,
/usr/lib/hadoop-hdfs/lib/*,
/usr/lib/hadoop-mapreduce/*,
/usr/lib/hadoop-mapreduce/lib/*,
/usr/lib/hadoop-yarn/*,
/usr/lib/hadoop-yarn/lib/*
</value>
</property>
在ResourceManager和/或NodeManager守护进程以不完整的应用程序类路径启动的YARN集群上将发生此问题。即使是像这样简单的包含spark-shell也会失败:
user@linux$ spark-shell --master yarn-client
遗憾的是,你只有在启动应用程序时才会发现;或者运行足够长的时间来运行所缺少的类。为了解决这个问题,我使用了以下CLASSPATH命令
的输出user@linux$ yarn classpath
并将其清理(,因为它包含重复项和非规范条目), 将附加到下面的YARN配置指令中,该指令在/etc/hadoop/conf/YARN -site.xml,最后重新启动YARN集群守护进程:
user@linux$ sudo vi /etc/hadoop/conf/yarn-site.xml
[ ... ]
<property>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$YARN_HOME/*,
$YARN_HOME/lib/*,
/etc/hadoop/conf,
/usr/lib/hadoop/*,
/usr/lib/hadoop/lib,
/usr/lib/hadoop/lib/*,
/usr/lib/hadoop-hdfs,
/usr/lib/hadoop-hdfs/*,
/usr/lib/hadoop-hdfs/lib/*,
/usr/lib/hadoop-yarn/*,
/usr/lib/hadoop-yarn/lib/*,
/usr/lib/hadoop-mapreduce/*,
/usr/lib/hadoop-mapreduce/lib/*
</value>
</property>
上面不包含对环境变量的引用的条目是我添加的。记住,在重新启动ResourceManager和NameNode守护进程之前,将这个修改后的文件复制到YARN集群上的所有节点。
一般情况下,您需要将所有未提供的依赖项(类和模块)打包到您的应用程序存档中。=:)