我正在尝试执行地图减少代码如下:
hadoop jar /usr/lib/Hadoop/Hadoop-streaming-0.20.2-cdh3u2.jar –file mapper.py –mapper mapper.py –file reducer.py – reducer reducer.py –input /user/training/samplypy.txt –ouput /user/training/pythonMR/output
get below exception -
Exception in thread "main" java.lang.ClassNotFoundException: –file
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
我使用Hadoop 1.0.3。我已经尝试了多个版本的hadoop-streaming jar,如:
hadoop-streaming-0.20.2-cdh3u2.jar
hadoop-streaming-1.2.0.jar
hadoop-streaming.jar
-
我可以告诉你的一件事是你没有使用'-file'语句的完整路径:
-file/mapper/location/mapper.py(使用完整路径和文件名)
-mapper mapper.py(正确,只有mapper文件名)
-file/reducer/location/reducer.py(使用完整路径和文件名)
- reducer reducer.py(正确,只有reducer文件名)
-
确保你的-input和-output指向HDFS而不是本地路径
hadoop jar /opt/cloudera/parcels/hadoop-streaming.jar
-D mapred.reduce.tasks=15 -D stream.map.input.field.separator=',' -D stream.map.output.field.separator=','
-D mapred.textoutputformat.separator=','
-input /user/temp/in/
-output /user/temp/out
-file /app/qa/python/mapper.py
-mapper mapper.py
-file /app/qa/python/reducer.py
-reducer reducer.py