这个Hadoop -Mapreduce作业信息是什么意思



我在1MB数据上运行了Hadoop-Mapreduce作业字数程序。我有一些疑问来理解下面的信息:

  • 什么是计数器?
  • 为什么map任务是两个,因为我知道映射的数量是由输入拆分的#决定的,输入拆分的最小大小为64MB。所以从逻辑上讲应该只有一个地图任务!?

  • 化简器的输出数据大小是多少?

  • CPU
  • 花费的时间,哪个CPU导致每个任务跟踪器都有自己的CPU和内存?

多谢!

[user1@li417-43 ~]$ hadoop jar wordcount1.jar wordcount1.WordCount -D mapred.reduce.tasks=10 wordin wordout10-1m
    14/12/16 19:55:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    14/12/16 19:55:46 INFO mapred.FileInputFormat: Total input paths to process : 1
    14/12/16 19:55:46 INFO mapred.JobClient: Running job: job_201405031326_0032
    14/12/16 19:55:47 INFO mapred.JobClient:  map 0% reduce 0%
    14/12/16 19:55:59 INFO mapred.JobClient:  map 100% reduce 0%
    14/12/16 19:56:04 INFO mapred.JobClient:  map 100% reduce 40%
    14/12/16 19:56:09 INFO mapred.JobClient:  map 100% reduce 80%
    14/12/16 19:56:14 INFO mapred.JobClient:  map 100% reduce 100%
    14/12/16 19:56:15 INFO mapred.JobClient: Job complete: job_201405031326_0032
    14/12/16 19:56:15 INFO mapred.JobClient: Counters: 34
    14/12/16 19:56:15 INFO mapred.JobClient:   File System Counters
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of bytes read=2008100
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of bytes written=5988058
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of read operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of large read operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of write operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of bytes read=1005254
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of bytes written=140119
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of read operations=14
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of large read operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of write operations=20
    14/12/16 19:56:15 INFO mapred.JobClient:   Job Counters
    14/12/16 19:56:15 INFO mapred.JobClient:     Launched map tasks=2
    14/12/16 19:56:15 INFO mapred.JobClient:     Launched reduce tasks=10
    14/12/16 19:56:15 INFO mapred.JobClient:     Data-local map tasks=1
    14/12/16 19:56:15 INFO mapred.JobClient:     Rack-local map tasks=1
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=12953
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=49609
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/12/16 19:56:15 INFO mapred.JobClient:   Map-Reduce Framework
    14/12/16 19:56:15 INFO mapred.JobClient:     Map input records=35293
    14/12/16 19:56:15 INFO mapred.JobClient:     Map output records=181014
    14/12/16 19:56:15 INFO mapred.JobClient:     Map output bytes=1646012
    14/12/16 19:56:15 INFO mapred.JobClient:     Input split bytes=206
    14/12/16 19:56:15 INFO mapred.JobClient:     Combine input records=0
    14/12/16 19:56:15 INFO mapred.JobClient:     Combine output records=0
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce input groups=14276
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce shuffle bytes=2008160
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce input records=181014
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce output records=14276
    14/12/16 19:56:15 INFO mapred.JobClient:     Spilled Records=362028
    14/12/16 19:56:15 INFO mapred.JobClient:     CPU time spent (ms)=26020
    14/12/16 19:56:15 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1427562496
    14/12/16 19:56:15 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=8291246080
    14/12/16 19:56:15 INFO mapred.JobClient:     Total committed heap usage (bytes)=477896704
    14/12/16 19:56:15 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
    14/12/16 19:56:15 INFO mapred.JobClient:     BYTES_READ=1002479
    计数器
  1. :34是计数器数量(以下信息数量)

  2. 我认为,这是由于投机执行(在[https://developer.yahoo.com/hadoop/tutorial/module4.html]上搜索投机)。Hadoop 会启动 2 次相同的映射器,看看哪个会先完成(然后第二个被杀死)。您可以通过更改映射站点.xml文件中的mapred.map.tasks.speculative.execution配置属性来禁用它。

一个映射器

在本地启动,第二个映射器在同一机架上,但在另一个节点上。(数据本地映射任务 = 1,机架本地映射任务 = 1)

  1. 您的化简器的输出中有 14276 行(减少输出记录 = 14276)。

  2. 花费的 CPU 时间 (ms) 是每个节点上每个任务消耗的 CPU 时间的总时间。这是为了比较目的。

相关内容

  • 没有找到相关文章

最新更新