选择配置单元执行引擎



下面显示的3个hive执行引擎中,在Hadoop集群中工作时更推荐哪一个。以及当我们必须使用时,有哪些用例(理想的选择(。

我尝试了一个样本大小为 400M 的查询,引擎 Tez 给我的输出比其他 2 个更快,查询摘要包括分组和过滤。

set hive.execution.engine=spark;
set hive.execution.engine=tez;
set hive.execution.engine=mr;

我试图通过查看查询来得出答案,应该能够做出决定,即特定引擎将比其他引擎更快地给出结果。

The benefits that Tez provides over MapReduce execution engine while using Hive are:
● Tez does not write data to the disk during the intermediary steps of a Hive query. Tez makes use of
Directed Acyclic Graphs and the data from an intermediary step is passed on to the next step in the
graph instead of being written to the disk like it is done when using the MapReduce engine.
Removal of these IO operations saves a lot of time when dealing with large amounts of data.
● Tez and YARN together enable you to use objects in a container across applications. If two
applications require the same object(say a data frame) and are running within the same container,
you need not create the same object, again and again, you can reuse it. This leads to better
management of resources and also helps improve the performance.

请在此处查看火花引擎

https://community.cloudera.com/t5/Support-Questions/Hive-execution-engine-set-to-Spark-is-recommended/m-p/177906

如果要运行交互式查询,则适合LLAP(实时长和进程(引擎。

最新更新