Hadoop streaming failed with java.io.FileNotFoundException

我编写了一个map- only python map-reduce作业，它接受来自标准输入的数据并对其进行处理以产生一些输出。它在本地执行时运行良好。然而，当我试图使用hadoop执行它时，我得到的文件没有发现异常。它无法定位mapper.py文件。另外，下面是我用来运行脚本的命令:

hadoop jar hadoop-streaming-1.1.1.jar -D mapred.reduce.tasks=0 -file "$PWD/mapper.py" -mapper "$PWD/mapper.py" -input "relevance/test.txt" -output "relevance/test_output_8.txt"

test.txt文件也已经复制到HDFS。

错误:

java.io.FileNotFoundException: File /data1/mapr-hadoop/mapred/local/taskTracker/***********/job_201405060940_908425/attempt_201405060940_908425_m_000000_0/work/******/mapper.py does not exist.

有人知道我在这里错过了什么吗?

从文件路径中去掉$PWD解决了这个问题。工作命令:

hadoop jar hadoop-stream -1.1.1.jar -D mapred.reduce。-file "mapper.py" -mapper .py" -input "relevance/test.txt" -output "relevance/test_output_8.txt"

同时，确保路径在" "中指定。我在网上看到很多例子，这些例子都没有"。

相关内容

最新更新

热门标签：