我想知道Map Reduce 1算法相对于MR2的缺点。
Here are two exciting and significant additions to the Hadoop framework:
• HDFS Federation: provides a name service that is both scalable and reliable.
• YARN: Yet Another Resource Negotiator,it divides the two major functions of the JobTracker(resource management and life cycle management) into separate components.
Hadoop1.x的一个关键问题是提供了一个高度可用的名称节点。HDFSFederation不仅提供HA名称服务,而且还允许分配工作负载,因为名称节点现在可以水平扩展。
YARN为跨Hadoop集群协商和执行作业提供了一种逻辑上的职责分离。YARN的最终结果是一个新的、更通用的资源管理框架,它不仅仅适用于Map Reduce作业。
Here are some of the articles
http://blog.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/
http://hortonworks.com/blog/introducing-apache-hadoop-yarn/
Hadoop 1.x is all about Map -reduce means you can run only map reduce but
YARN is more general than MR and it should be possible to run other computing models like BSP besides MR. Prior to YARN, it required a separate cluster for MR, BSP and others. Now they they can coexist in a single cluster, which leads to higher usage of the cluster. Here are some of the applications ported to YARN.
In the current system, JobTracker views the cluster as composed of nodes (managed by individual TaskTrackers) with distinct map slots and reduce slots, which are not fungible. Utilization issues occur because maps slots might be ‘full’ while reduce slots are empty (and vice-versa). Fixing this was necessary to ensure the entire system could be used to its maximum capacity for high utilization..
Also, it makes it possible to run different versions of Hadoop in the same cluster which is not possible with legacy MR, which makes is easy from a maintenance point.
我觉得MR1的主要困难是
难以执行需要全局共享状态的算法