我有一个正在运行的Hadoop(2.6.0)集群,有6个节点(包括主节点),并希望在mapreduce模式下运行一个猪(0.14.0)脚本。该脚本正在运行,没有错误,但不幸的是,它似乎只在主节点上运行。在我的研究过程中,我尝试对Hadoop配置文件进行一些更改,但没有成功。
你能帮我弄清楚如何让猪在整个集群上工作吗?
以下是一些信息:
每个节点上的配置:
常规:
/etc/hosts
127.0.0.1 localhost
192.168.101.3 master
192.168.101.4 node1
192.168.101.5 node2
192.168.101.6 node3
192.168.101.7 node4
192.168.101.8 node5
Hadoop:
纱线站点.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
<description>...</description>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8041</value>
<description>...</description>
</property>
<property>
<name>yarn.nodemanager.aux_services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux_services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>master:19888/jobhistory/logs/</value>
</property>
</configuration>
核心站点.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary dictionaries.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
<description>...</description>
</property>
</configuration>
地图网站.xml
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>master:54311</value>
<description>...</description>
</property>
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
<description>...</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
<description>...</description>
</property>
</configuration>
出猪产量:
15/01/09 13:12:54 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/01/09 13:12:54 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
15/01/09 13:12:54 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2015-01-09 13:12:54,845 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
2015-01-09 13:12:54,845 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hduser/pig_1420805574843.log
2015-01-09 13:12:56,450 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hduser/.pigbootup not found
2015-01-09 13:12:56,876 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-01-09 13:12:56,886 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:12:56,886 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000/
2015-01-09 13:12:58,146 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master:54311
2015-01-09 13:12:59,195 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:12:59,418 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:12:59,598 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:00,496 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER,UNION
2015-01-09 13:13:00,618 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:00,634 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-01-09 13:13:00,713 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2015-01-09 13:13:00,987 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-01-09 13:13:01,037 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-01-09 13:13:01,038 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-01-09 13:13:01,079 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:01,103 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-01-09 13:13:01,105 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2015-01-09 13:13:01,149 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-01-09 13:13:01,161 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-01-09 13:13:01,161 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-01-09 13:13:01,161 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-01-09 13:13:01,167 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-01-09 13:13:19,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-1277984423/tmp-918732110/pig-0.14.0-core-h2.jar
2015-01-09 13:13:20,063 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1277984423/tmp883771618/automaton-1.11-8.jar
2015-01-09 13:13:20,621 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1277984423/tmp-1372558595/antlr-runtime-3.4.jar
2015-01-09 13:13:26,600 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-1277984423/tmp-1556176302/guava-11.0.2.jar
2015-01-09 13:13:29,300 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-1277984423/tmp145012374/joda-time-2.1.jar
2015-01-09 13:13:29,718 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-01-09 13:13:29,736 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-01-09 13:13:29,736 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-01-09 13:13:29,736 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-01-09 13:13:29,840 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-01-09 13:13:29,841 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-01-09 13:13:30,191 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2015-01-09 13:13:30,384 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:30,785 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-01-09 13:13:30,949 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:13:30,949 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:13:31,250 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 52
2015-01-09 13:13:31,309 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:13:31,309 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:13:31,355 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 24
2015-01-09 13:13:31,378 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:13:31,379 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:13:31,394 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 6
2015-01-09 13:13:31,587 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:82
2015-01-09 13:13:31,706 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:13:32,475 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local647507189_0001
2015-01-09 13:13:33,628 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805612754/pig-0.14.0-core-h2.jar <- /home/hduser/pig-0.14.0-core-h2.jar
2015-01-09 13:13:33,758 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp-1277984423/tmp-918732110/pig-0.14.0-core-h2.jar as file:/app/hadoop/tmp/mapred/local/1420805612754/pig-0.14.0-core-h2.jar
2015-01-09 13:13:33,759 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805612755/automaton-1.11-8.jar <- /home/hduser/automaton-1.11-8.jar
2015-01-09 13:13:33,770 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp-1277984423/tmp883771618/automaton-1.11-8.jar as file:/app/hadoop/tmp/mapred/local/1420805612755/automaton-1.11-8.jar
2015-01-09 13:13:33,772 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805612756/antlr-runtime-3.4.jar <- /home/hduser/antlr-runtime-3.4.jar
2015-01-09 13:13:33,781 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp-1277984423/tmp-1372558595/antlr-runtime-3.4.jar as file:/app/hadoop/tmp/mapred/local/1420805612756/antlr-runtime-3.4.jar
2015-01-09 13:15:54,534 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp206201348/tmp-1481268210/guava-11.0.2.jar
2015-01-09 13:15:56,233 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/hduser/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp206201348/tmp-1921418840/joda-time-2.1.jar
2015-01-09 13:15:56,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-01-09 13:15:56,366 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-01-09 13:15:56,367 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-01-09 13:15:56,368 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-01-09 13:15:56,483 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-01-09 13:15:56,486 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-01-09 13:15:56,505 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2015-01-09 13:15:56,582 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:15:56,695 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2015-01-09 13:15:57,070 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:15:57,070 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:15:57,197 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 52
2015-01-09 13:15:57,227 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:15:57,228 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:15:57,263 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 24
2015-01-09 13:15:57,289 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-01-09 13:15:57,289 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-01-09 13:15:57,306 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 6
2015-01-09 13:15:57,393 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:82
2015-01-09 13:15:57,416 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:15:57,791 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local561414911_0001
2015-01-09 13:15:58,741 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758017/pig-0.14.0-core-h2.jar <- /home/hduser/pig-0.14.0-core-h2.jar
2015-01-09 13:15:58,755 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp1912320441/pig-0.14.0-core-h2.jar as file:/app/hadoop/tmp/mapred/local/1420805758017/pig-0.14.0-core-h2.jar
2015-01-09 13:15:58,757 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758018/automaton-1.11-8.jar <- /home/hduser/automaton-1.11-8.jar
2015-01-09 13:15:58,766 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp-886499198/automaton-1.11-8.jar as file:/app/hadoop/tmp/mapred/local/1420805758018/automaton-1.11-8.jar
2015-01-09 13:15:58,768 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758019/antlr-runtime-3.4.jar <- /home/hduser/antlr-runtime-3.4.jar
2015-01-09 13:15:58,778 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp1437387446/antlr-runtime-3.4.jar as file:/app/hadoop/tmp/mapred/local/1420805758019/antlr-runtime-3.4.jar
2015-01-09 13:15:58,779 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758020/guava-11.0.2.jar <- /home/hduser/guava-11.0.2.jar
2015-01-09 13:15:58,786 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp-1481268210/guava-11.0.2.jar as file:/app/hadoop/tmp/mapred/local/1420805758020/guava-11.0.2.jar
2015-01-09 13:15:58,787 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /app/hadoop/tmp/mapred/local/1420805758021/joda-time-2.1.jar <- /home/hduser/joda-time-2.1.jar
2015-01-09 13:15:58,795 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://master:9000/tmp/temp206201348/tmp-1921418840/joda-time-2.1.jar as file:/app/hadoop/tmp/mapred/local/1420805758021/joda-time-2.1.jar
2015-01-09 13:15:58,953 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758017/pig-0.14.0-core-h2.jar
2015-01-09 13:15:58,954 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758018/automaton-1.11-8.jar
2015-01-09 13:15:58,955 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758019/antlr-runtime-3.4.jar
2015-01-09 13:15:58,955 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758020/guava-11.0.2.jar
2015-01-09 13:15:58,955 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/app/hadoop/tmp/mapred/local/1420805758021/joda-time-2.1.jar
2015-01-09 13:15:58,970 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
2015-01-09 13:15:58,973 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local561414911_0001
2015-01-09 13:15:58,973 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases records_infobox,records_mappingbased,records_person,records_union,result_filter
2015-01-09 13:15:58,973 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records_person[10,17],records_person[-1,-1],null[-1,-1],records_union[13,16],records_infobox[6,18],records_infobox[-1,-1],result_filter[16,16],records_mappingbased[8,23],records_mappingbased[-1,-1],null[-1,-1] C: R:
2015-01-09 13:15:58,990 [Thread-19] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2015-01-09 13:15:58,991 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-01-09 13:15:58,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local561414911_0001]
2015-01-09 13:15:59,067 [Thread-19] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-01-09 13:15:59,069 [Thread-19] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-01-09 13:15:59,069 [Thread-19] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-01-09 13:15:59,094 [Thread-19] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2015-01-09 13:15:59,257 [Thread-19] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2015-01-09 13:15:59,258 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local561414911_0001_m_000000_0
2015-01-09 13:15:59,459 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2015-01-09 13:15:59,470 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 134217728
Input split[0]:
Length = 134217728
ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
Locations:
-----------------------
2015-01-09 13:15:59,522 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://master:9000/wiki/infobox_properties_en.nt:0+134217728
2015-01-09 13:15:59,662 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-01-09 13:15:59,743 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records_person[10,17],records_person[-1,-1],null[-1,-1],records_union[13,16],records_infobox[6,18],records_infobox[-1,-1],result_filter[16,16],records_mappingbased[8,23],records_mappingbased[-1,-1],null[-1,-1] C: R:
2015-01-09 13:15:59,798 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject(ACCESSING_NON_EXISTENT_FIELD): Attempt to access field which was not found in the input
2015-01-09 13:15:59,815 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject(ACCESSING_NON_EXISTENT_FIELD): Attempt to access field which was not found in the input
2015-01-09 13:16:05,578 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:08,582 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:10,209 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:10,699 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local561414911_0001_m_000000_0 is done. And is in the process of committing
2015-01-09 13:16:10,714 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map
2015-01-09 13:16:10,714 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task attempt_local561414911_0001_m_000000_0 is allowed to commit now
2015-01-09 13:16:10,849 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local561414911_0001_m_000000_0' to hdfs://master:9000/tmp/temp206201348/tmp-1297558267/_temporary/0/task_local561414911_0001_m_000000
2015-01-09 13:16:10,854 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map
2015-01-09 13:16:10,854 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local561414911_0001_m_000000_0' done.
2015-01-09 13:16:10,855 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local561414911_0001_m_000000_0
2015-01-09 13:16:10,855 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local561414911_0001_m_000001_0
2015-01-09 13:16:10,877 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2015-01-09 13:16:10,883 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
....
我有类似的问题,但mapred-site.xml
不同,但我认为问题仍然存在。
Yarn
是 MR
的下一个版本,这就是为什么我们需要文件中的以下部分以确保它与旧程序一起使用:
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
但是,假设您使用Yarn
,您没有Jobtracker
,因为它在某种意义上被ResourceManager
取代(实际上,这是一个完全的重新设计。您可以在 http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/) 中阅读它
因此,您需要删除以下行:
<property>
<name>mapreduce.jobtracker.address</name>
<value>master:54311</value>
<description>...</description>
</property>
从文件中,猪会很好去。
(有一个相关的答案讨论此更改为什么在YARN上有一个mapreduce.jobtracker.address配置?)
根据您在此处发布的日志,您的作业正在本地系统中运行([本地作业运行程序])
Pig 中有一个属性称为 pig.auto.local.enabled
默认情况下,它的性能为 true,这意味着如果您的数据大小小于属性 pig.auto.local.input.maxbytes
中设置的大小,默认情况下为 1 GB,则 pig 将不会在集群中执行(Yarn UI 也不会显示作业的应用程序),而是在启动它的节点中执行。您可以在 pig.properties 文件中设置的两个属性。