如何使用Ubuntu终端运行Map Reduce程序



我的hadoop路径是/usr/local/hadoop和jar组成的/usr/local/hadoop/share与java 7。请帮我解决这个问题和JAVA_HOME =/沾污/lib/jvm/jdk-7-amd64

你确实给出了一个很长的细节!但是您可以按照以下步骤执行jar文件:

1-在bashrc中添加依赖项:

export HADOOP_PREFIX=/path/to/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
export CLASSPATH=$CLASSPATH:$HADOOP_PREFIX/*:.

2-从/bin运行如下:

hadoop jar /path/to/jar/jar-name name.of.the.driver.class.in.jar <input-path> <output-path>

如果您共享自己的系统命令会更好。

最近我使用以下步骤在终端上执行它。我的系统是Ubuntu 14.04 LTS....

follow this step..
Compilation Process for MapReduce By Kamalakar Thakare:
--> STEP 1. start hadoop.
$ start-all.sh
--> STEP 2. Check all components of Hadoop whether it is ready or not.
$ jps
--> STEP 3. Assuming environment variables are set as follows:
export JAVA_HOME=/usr/java/default          <comment : Dont worry if you have other version of java instead of default.>
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar  <comment: this is MOST IMPORTANT tool file. Make sure you have. If you didnt find it                                dont worry its having different location on your PC.>
--> STEP 4. Yepppiii...now copy the code of to the home directory. Make one note 'Its not nessesory to store our code onto HDFS file'.
--> STEP 5. Now its time to compile our main code. Fire below command
$ javac -classpath <hadooop-core.jar file> -d <Your New Directory>/ <sourceCode.java>
Meaning of this command :
*Its simply compile your Java source file that is sourceCode.java.
*Required <hadoop-core.jar file must contain all libraries mention in your source code. Here I suggest you some file version and their location address.
http://www.java2s.com/Code/Jar/h/Downloadhadoop0201devcorejar.htm
in this link at below you get download link. its name is hadoop-0.20.1-dev-core.jar.zip. Download it and extract it. It generate one 'jar' file. Which is Most Important while compiling. In above command <hadooop-core.jar file> file is this generated .jar file.
* -d option create a directory for you and store all class file into it.
--> STEP 6. Mapreduce code consist of three main component 1. Mapper class 2. Driver Class 3. Reducer Class.
so its focusable that we create one jar file which contains three component's class defination.
so fire below command to generate jar file.
$ jar -cvf <File you have to create> -C <Directory you have obtained in previous command> .
* Remember at the last dot '.' is must its stands for all contains.
* option -c for create new archive
  option -v for generate verbose output on standard output
  option -f for  specify archive file name

for example..
$ javac -classpath hadoop-0.20.1-dev-core.jar -d LineCount/ LineCount.java  : we create LineCount/ directory here.
$ jar -cvf LineCount.jar -C LineCount/ .                    : here LineCount.jar is our jar file which creating here and                                            LineCount/ is my directory.

-->STEP 7. Now its tym to run your code on hadoop framework.
make sure you put your input files on your hdfs alredy. If not then add them using
$ hadoop fs -put <source file path> /input

-->STEP 8. Now run your program using ur Jar file.
$ hadoop jar <your jar file> <directory name without /> /input/<your file name> /output/<output file name>
for example..
if my jar file is test.jar,
directory I was created is test/
my input file is /input/a.txt
and I want entire output on output/test then my command will be.
$ hadoop jar test.jar test /input/a.txt /output/test
--> STEP 9. Wow your so lucky that upto now you crosses thousand of error bridge where others programmers are still stuck.
after successfully completion of your program /output directory create two files for you.
one is _SUCCESS for completion and programs log information.
second one is part-r-00000 which is context file containing respective output.
read it using..
$ hadoop fs -cat /output/<your file>/part-r-00000

IMPORTANT NOTES :
1.  If you get auxService error while creating job then make sure your yarn that is resource manager must contain auxilliary services configuration. If its not then add following piece of line to your yarn-site.xml file.
Its location is.. /usr/local/hadoop/etc/hadoop
copy this..and paste to yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>
2. If your get error for Job.getInstance while running code over hadoop. Its just because hadoop cannot create job instance on that moment for you so simply replace your jobInstance statement with 
Job job = new Job(configurationObject,"Job Dummy Name");

References:
https://dataheads.wordpress.com/2013/11/21/hadoop-2-setup-on-64-bit-ubuntu-12-04-part-1/
https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using-map-reduce-framework
https://sites.google.com/site/hadoopandhive/home/how-to-run-and-compile-a-hadoop-program
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

相关内容

  • 没有找到相关文章

最新更新