我已经在Ubuntu 14.04上配置了Hadoop 2.6.0。我最初运行wordcount地图减少程序,以了解地图减少工作。我在访问文件系统时遇到了一些问题。我在/opt/hadoop2.6.0
中有Hadoop主目录。
-
驱动程序
Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // configuration should contain reference to your namenode FileSystem hdfs =FileSystem.get(new Configuration()); Path workingDir=hdfs.getWorkingDirectory(); Path newFolderPath= new Path("/output"); newFolderPath=Path.mergePaths(workingDir, newFolderPath); if(hdfs.exists(newFolderPath)) { hdfs.delete(newFolderPath, true); //Delete existing Directory } hdfs.mkdirs(newFolderPath); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job,newFolderPath ); System.exit(job.waitForCompletion(true) ? 0 : 1); //line no. 76 // job.submit();
-
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> </property> </configuration>
-
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop-2.6.0/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop-2.6.0/dfs/data</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>dfs.http.address</name> <value>localhost:50070</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
-
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.application.classpath</name> <value> %HADOOP_HOME%etchadoop, %HADOOP_HOME%sharehadoopcommon*, %HADOOP_HOME%sharehadoopcommonlib*, %HADOOP_HOME%sharehadoophdfs*, %HADOOP_HOME%sharehadoophdfslib*, %HADOOP_HOME%sharehadoopmapreduce*, %HADOOP_HOME%sharehadoopmapreducelib*, %HADOOP_HOME%sharehadoopyarn*, %HADOOP_HOME%sharehadoopyarnlib* </value> </property>
-
运行map reduce jar:
hadoop jar /home/ifs-admin/wordcount.jar WordCount /user/ifs/input
执行例外:
15/08/23 12:12:25 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 15/08/23 12:12:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/ifs-admin/output already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314) at WordCount.main(WordCount.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
-
如果我删除输出目录,它会显示以下错误:
Exception in thread "main" ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:652) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:490) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:599) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:182) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314) at WordCount.main(WordCount.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
如何解决这个问题?
试试这段代码注意-不要创建目录,hadoop会自动做
FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:9000"),
conf);
Path workingDir=hdfs.getWorkingDirectory();
Path newFolderPath= new Path("/output");
newFolderPath=Path.mergePaths(workingDir, newFolderPath);
if(hdfs.exists(newFolderPath))
{
hdfs.delete(newFolderPath); //Delete existing Directory
}
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,newFolderPath );
System.exit(job.waitForCompletion(true) ? 0 : 1); //line no. 76
// job.submit();
当我们在/opt文件夹下设置temp.dir或名称节点dir或数据节点dir时,Hadoop无法在hdfs中创建目录。访问:https://unix.stackexchange.com/questions/11544/what-is-the-difference-between-opt-and-usr-local。
我已经把core-site.xml中的hadoop.tmp.dir改为/usr/local/hadoop/dfs/data.