将S3操作作为参数传递给hadoop jar

我想把AmazonS3中的文件位置作为参数传递给我的Hadoop jar。该文件包含一个XML文件，我需要在地图制作中的驱动程序类中解析该XML文件。那么我如何传递该位置呢？在哪里指定S3凭据？

您不能使用s3n位置运行MR作业。上传你的jar文件和输入到S3，并通过elastic-mapreduce运行它，如下所示：

elastic-mapreduce --jar s3://mybucket/mycode.jar 
    --args "-D,mapred.reduce.tasks=0"
    --arg s3://mybucket/input 
    --arg s3://mybucket/output

您需要在配置文件（如mapred-site.xml or core-site.xml）中设置s3凭据，或者您可以使用-conf使用自定义配置文件传递以下参数。您的hadoop jar命令将类似于hadoop jar <jar_file_name> <class_name> -conf <custom_conf> <arguments>

<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>AWS-ID</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>AWS-SECRET-KEY</value>
</property>

相关内容

最新更新

热门标签：