获取Java中MapReduce的输入和输出数量

我想获得map阶段和reduce阶段的输入和输出的数量以及使用Java完成map/reduce作业的时间。这些统计数据是写在终端上的，但我需要用Java代码获取它，并将其写在我自己的界面上，就在

行之后:

job_blocking.waitForCompletion(true);

在这行之后，您可以通过获取这些计数器的值来获得MAP_INPUT_RECORDS和REDUCE_OUTPUT_RECORDS(也是MAP_OUTPUT_RECORDS)的数量:

long map_input_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","MAP_INPUT_RECORDS")
    .getValue();
long map_output_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","MAP_OUTPUT_RECORDS")
    .getValue();
long reduce_input_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","REDUCE_INPUT_RECORDS")
    .getValue();
long reduce_output_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","REDUCE_OUTPUT_RECORDS")
    .getValue();

对于运行作业所需的时间，我不知道是否有另一种方法(更容易)，而不是设置一个长变量，在执行之前和之后使用当前时间，并获得它们的差异

相关内容

最新更新

热门标签：