我是大数据开发的新手,遇到了一些可能很简单的问题,并向大人物寻求帮助。我想在本地测试hadoop通过s3a访问ceph的性能,我做了以下工作:
-
虚拟机安装了3台Ubuntu18.04服务器,并初始化了网络和IP,安装JDK18,并安装hadoop3.2.2;
-
免密码登录设置,修改主机和配置昵称;
-
配置JDK和hadoop的环境变量,修改core-site.xml/hdfs-site.xml/mapred-site.xml/hadoop-env.sh,位于hadoop3.X/etc/hadoop/目录中。
-
配置粘贴如下:
<property> <name>hadoop.tmp.dir</name> <value>file:/home/program/hadoop/data</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop04:9000</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>root</value> </property> <property> <name>fs.s3a.access.key</name> <value>ak</value> </property> <property> <name>fs.s3a.secret.key</name> <value>sk</value> </property> <property> <name>fs.s3a.impl</name> <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> </property> <property> <name>fs.s3a.endpoint</name> <value>https://172.17.37.60:8080</value> </property> <property> <name>fs.s3a.connection.ssl.enabled</name> <value>false</value> </property> hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.http-address</name> <value>hadoop04:9870</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/program/hadoop/data/dataname</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/program/hadoop/data/datanode</value> </property> </configuration> mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>hadoop04:9001</value> </property> </configuration>
-
在不使用s3a的前提下,我在Hadoop客户端调用Hadoop是正常的,它可以正常运行。为了使用s3a,我在hadoop网站上添加了hadoop客户端和hadoop-aws到pom依赖项。
-
我的探索是将hadoop-client-2.X.jar和hadoop-aws-2.X.jar放在服务器的hadoop-3.2.2/share/hadoop/client/目录中。重新启动hadoop,然后使用命令hadoop-fs-lss3a://endpoint:port,我得到一个Exception
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2638) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3341) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3373) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:125) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3424) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3392) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:485) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352) at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250) at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104) at org.apache.hadoop.fs.shell.Command.run(Command.java:177) at org.apache.hadoop.fs.FsShell.run(FsShell.java:327) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:390) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2542) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2636) ... 16 more```
-
我知道S3AFileSystem异常的原因是jar包放错地方了。还有其他问题吗?
我在hive的lib目录中添加了以下JAR,它对我(hive 3.0(有效
hadoop-aws-3.2.1.jar
guava-27.0-jre.jar
aws-java-sdk-bundle-1.11.375.jar
你也可以把这些罐子放在不同的位置,并在hive-env.sh 中添加路径
export HIVE_AUX_JARS_PATH=<Path to all JAR files>