流式处理命令失败!当在CentOS7上的单节点hadoop集群设置中执行MapReduce python代码时



我已经在同一台机器上成功地执行了mapreduce java代码。现在我正试图在同一台机器上执行用python编写的Mapreduce代码。为此,我使用了hadoop_3.2.1和hadoop-streaming-3.3.2.1.jar。

我已经通过命令测试了代码

[dsawale@localhost ~]$ cat Desktop/sample.txt | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py | sort | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py

我发现它显示正确的输出。

但是当我尝试使用命令在hadoop集群上执行时

[dsawale@localhost ~]$ hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar -mapper mapper.py -reducer reducer.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -input /sample.txt -output pysamp

我得到的输出为:

packageJobJar: [PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, /tmp/hadoop-unjar6715579504628929924/] [] /tmp/streamjob3211585412475799030.jar tmpDir=null
Streaming Command Failed!

这是我的第一个python MapReduce程序。你能帮我消除这个错误吗。谢谢

配置文件:mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>

core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>  
<name>dfs.permission</name>
<value>false</value>
</property>
<property>  
<name>dfs.namenode.name.dir</name>
<value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/datanode</value>
</property>
</configuration>

yarn-site.xml:

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

传递给mapperreducer参数的文件路径不正确。

试试,

hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar 
-mapper PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py 
-reducer PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py  
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py 
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py 
-input /sample.txt 
-output pysamp

最新更新