我在为MapReduce编写减速器时遇到了一个问题。我想要得到非常大的文件的前10行,我使用了for循环和break。但是,break命令在hadoop上触发错误,所以我正在寻找另一种方法:
for line in fileinput.input():
if(counter>limit):
break
line = line.strip()
print (line)
counter +=1
错误日志:Error: java.io.IOException: subprocess exited successfully
R/W/S=6936/19/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=s2132211
HADOOP_USER=null
last tool output: |29670 YOU HAVE AATO|
Broken pipe
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:129)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
首先,要么您的示例格式不好,要么您有逻辑错误。print(line)
和counter += 1
应该是INSIDE for loop
更简单的写法是:
for counter, line in enumerate(fileinput.input()):
if(counter>limit):
break
line = line.strip()
print (line)
现在,如果这还不能解决问题,还有几个问题。
1)你能看到程序的任何输出(它实际上是从for循环打印的东西吗)?
2)程序是立即崩溃,还是在一段时间后崩溃?