r语言 - mr 中的错误(map = map, reduce = reduce, combined = combine, vectorized.reduce, : Hadoop 流失败,错误代码为



我正在运行下面的Rscript gdp。R

#!/usr/bin/env Rscript
Sys.getenv(c("HADOOP_HOME", "HADOOP_CMD", "HADOOP_STREAMING", "HADOOP_CONF_DIR"))
library(rmr2)
library(rhdfs)
setwd("/root/somnath/GDP_data/")
gdp <- read.csv("GDP.csv")
head(gdp)
hdfs.init()
gdp.values <- to.dfs(gdp)
aaplRevenue = 156508
gdp.map.fn <- function(k,v) {
key <- ifelse(v[4] < aaplRevenue, "less", "greater")
keyval(key, 1)
}
count.reduce.fn <- function(k,v) {
keyval(k, length(v))
}
count <- mapreduce(input=gdp.values, map=gdp.map.fn, reduce=count.reduce.fn)
from.dfs(count)
$val

和无法克服mapreduce函数中的以下错误:

流命令失败!错误在mr(map = map, reduce = reduce, combine = combine,矢量化)。减少,:Hadoop流失败,错误码为1调用mapreduce -> mr

[root@kkws029 RHadoop_scripts]# Rscript gdp.R
Loading required package: methods
Loading required package: rJava
HADOOP_CMD=/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
Be sure to run hdfs.init()
  CountryCode Number   CountryName     GDP
1         USA      1  UnitedStates  168000
2         CHN      2         China 9240270
3         JPN      3         Japan 4901530
4         DEU      4       Germany 3634823
5         FRA      5        France 2734949
6         GBR      6 UnitedKingdom 2522261
14/07/08 16:57:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
14/07/08 16:57:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/07/08 16:57:02 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
packageJobJar: [/tmp/hadoop-root/hadoop-unjar4163972761639211537/] [] /tmp/streamjob5583656452134995821.jar tmpDir=null
14/07/08 16:57:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/07/08 16:57:13 INFO mapred.FileInputFormat: Total input paths to process : 1
14/07/08 16:57:20 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
14/07/08 16:57:20 INFO streaming.StreamJob: Running job: job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:20 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:21 INFO streaming.StreamJob:  map 0%  reduce 0%
14/07/08 16:57:26 INFO streaming.StreamJob:  map 50%  reduce 0%
14/07/08 16:57:49 INFO streaming.StreamJob:  map 100%  reduce 100%
14/07/08 16:57:49 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:49 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:49 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:49 ERROR streaming.StreamJob: Job not successful. Error: NA
14/07/08 16:57:49 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  :
  hadoop streaming failed with error code 1
Calls: mapreduce -> mr
In addition: Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
4: In rmr.options("backend.parameters") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
Execution halted
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)

我的错误日志如下:

Loading objects:
  .Random.seed
  aaplRevenue
  count.reduce.fn
  gdp
  gdp.map.fn
  gdp.values
Loading objects:
  backend.parameters
  combine
  combine.file
  combine.line
  debug
  default.input.format
  default.output.format
  in.folder
  in.memory.combine
  input.format
  libs
  map
  map.file
  map.line
  out.folder
  output.format
  pkg.opts
  postamble
  preamble
  profile.nodes
  reduce
  reduce.file
  reduce.line
  rmr.global.env
  rmr.local.env
  save.env
  tempfile
  vectorized.reduce
  verbose
  work.dir
Loading required package: rhdfs
Loading required package: methods
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(c("rhdfs", "rJava", "methods", "rmr2", "stats", "graphics",  :
  can't load rhdfs
Loading required package: rmr2
Loading objects:
  backend.parameters
  combine
  combine.file
  combine.line
  debug
  default.input.format
  default.output.format
  in.folder
  in.memory.combine
  input.format
  libs
  map
  map.file
  map.line
  out.folder
  output.format
  pkg.opts
  postamble
  preamble
  profile.nodes
  reduce
  reduce.file
  reduce.line
  rmr.global.env
  rmr.local.env
  save.env
  tempfile
  vectorized.reduce
  verbose
  work.dir
Warning in Ops.factor(left, right) : < not meaningful for factors
Error in split.default(a.list, ceiling(seq_along(a.list)/every.so.many),  : 
  first argument must be a vector
Calls: <Anonymous> ... <Anonymous> -> do.call -> mapply -> split -> split.default
No traceback available 
Error during wrapup: 
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

任何建议都将非常感谢

谢谢- s

注意:我在stderr日志中引用了错误,其中指定没有找到系统变量HADOOP_CMD。有一种方法,我可以使HADOOP系统的环境变量导出到R?还要注意,我使用的是Sys。getenv(c("HADOOP_HOME",…))在我的脚本开始,但这似乎不工作,因为标准错误建议

请注意,我已经在我的~/.bash_profile

中为HADOOP环境变量添加了以下导出命令
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi
# User specific environment and startup programs
export JRI_PATH=/usr/lib64/R
export R_HOME=/usr/lib64/R
export R_SHARE_DIR=/usr/share/R
#export JRI_LD_PATH=${R_HOME}/library/rJava/jri:${R_HOME}/lib:${R_HOME}/bin
export LD_LIBRARY_PATH=${R_HOME}/library/rJava/jri
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CMD=${HADOOP_HOME}/bin/hadoop
export HADOOP_STREAMING=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export JAVA_HOME=/var/jdk1.7.0_25
#PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
PATH=$PATH:$R_HOME/bin:$JAVA_HOME/bin:$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/mahout:/opt/cloudera/parcels/CDH/lib/hadoop:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce:/var/lib/storm-0.9.0-rc2/lib:$HADOOP_CMD:$HADOOP_STREAMING:$HADOOP_CONF_DIR
export PATH

流媒体命令失败!

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

count <- mapreduce(input=gdp.values, map = gdp.map.fn, reduce = count.reduce.fn, #output = output, input.format = "text",
combine = T)

我在做同样的代码…但没有给我想要的答案……所以我做了一些练习,把每个东西都应用在地图的每个黑色上,最后我发现了作者犯的错误。这是修改代码。你可以输入

最新更新