使用s3a时遇到的困难



我是大数据开发的新手,遇到了一些可能很简单的问题,并向大人物寻求帮助。我想在本地测试hadoop通过s3a访问ceph的性能,我做了以下工作:

  1. 虚拟机安装了3台Ubuntu18.04服务器,并初始化了网络和IP,安装JDK18,并安装hadoop3.2.2;

  2. 免密码登录设置,修改主机和配置昵称;

  3. 配置JDK和hadoop的环境变量,修改core-site.xml/hdfs-site.xml/mapred-site.xml/hadoop-env.sh,位于hadoop3.X/etc/hadoop/目录中。

  4. 配置粘贴如下:

    <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/home/program/hadoop/data</value>
    <description>Abase for other temporary directories.</description>
    </property>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop04:9000</value>
    </property>
    <property>
    <name>hadoop.http.staticuser.user</name>
    <value>root</value>
    </property>
    <property>
    <name>fs.s3a.access.key</name>
    <value>ak</value>
    </property>
    <property>
    <name>fs.s3a.secret.key</name>
    <value>sk</value>
    </property>
    <property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    </property>
    <property>
    <name>fs.s3a.endpoint</name>
    <value>https://172.17.37.60:8080</value>
    </property>
    <property>
    <name>fs.s3a.connection.ssl.enabled</name>
    <value>false</value>
    </property>
    hdfs-site.xml
    <configuration>
    <property>
            <name>dfs.replication</name>
            <value>3</value>
          </property>
    <property>
            <name>dfs.namenode.http-address</name>
            <value>hadoop04:9870</value>
          </property>
          <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/home/program/hadoop/data/dataname</value>
          </property>
          <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home/program/hadoop/data/datanode</value>
          </property>
    </configuration>
    mapred-site.xml
    <configuration>
    <property>
            <name>mapred.job.tracker</name>
            <value>hadoop04:9001</value>
          </property>
    </configuration>
    
  5. 在不使用s3a的前提下,我在Hadoop客户端调用Hadoop是正常的,它可以正常运行。为了使用s3a,我在hadoop网站上添加了hadoop客户端和hadoop-aws到pom依赖项。

  6. 我的探索是将hadoop-client-2.X.jar和hadoop-aws-2.X.jar放在服务器的hadoop-3.2.2/share/hadoop/client/目录中。重新启动hadoop,然后使用命令hadoop-fs-lss3a://endpoint:port,我得到一个Exception

    java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2638)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3341)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3373)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:125)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3424)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3392)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:485)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
    at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
    Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2542)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2636)
    ... 16 more```
    
  7. 我知道S3AFileSystem异常的原因是jar包放错地方了。还有其他问题吗?

我在hive的lib目录中添加了以下JAR,它对我(hive 3.0(有效

hadoop-aws-3.2.1.jar
guava-27.0-jre.jar
aws-java-sdk-bundle-1.11.375.jar

你也可以把这些罐子放在不同的位置,并在hive-env.sh 中添加路径

export HIVE_AUX_JARS_PATH=<Path to all JAR files>

相关内容

最新更新