1) 我们正在尝试使用 S3DistCp jar (http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html#emr-s3distcp-verisons) 将 hdfs 文件从 AWS 中国 hadoop 主实例复制到 AWS 中国 S3 存储桶。
2)我们正在运行AWS中国hadoop master的命令
hadoop jar /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar --src hdfs://${HDFS_DIR} --dest s3n://${S3_BUCKETNAME}/${Folder_Name}/ --s3Endpoint=s3.cn-north-1.amazonaws.com.cn
3)当我们运行这个"s3-dist-cp"命令时,会抛出以下异常
16/02/22 08:39:52 INFO s3distcp.S3DistCp: Using output path 'hdfs:/tmp/f6a864f8-d70d-426f-b05f-08f7d0097fd9/output'
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/gson/internal/Pair
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.getSrcPrefixes(S3DistCp.java:468)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.createInputFileList(S3DistCp.java:521)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:850)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:720)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: com.google.gson.internal.Pair
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 13 more
4)您能否让我们知道是否有任何其他替代方法,然后使用"s3-dist-cp"将hdfs文件从AWS中国Hadoop主实例复制到AWS中国S3存储桶?
感谢和问候,
阿米特
我们创建的AWS中国Hadoop EMR集群似乎出现了一些问题。我们创建了一个新的AWS中国Hadoop EMR集群,并使用"s3-dist-cp"命令,我们现在能够将HDFS文件从AWS中国Hadoop主服务器上传到AWS中国S3存储桶
。s3-dist-cp 命令示例
S3-dist-cp --src=hdfs:///HDFS_Folder_Name/--dest=s3n://my-bucket/folder