我是Hive的新手,我正在使用HBASE-1.1.0,Hadoop-2.5.1和Hive-0.13来满足我的要求。
设置绝对很好,我能够使用直线运行 hive 查询。
查询 : 从X_Table中选择计数(*)。
查询以 37.848 秒完成。
我在Maven项目中设置了相同的环境,并尝试使用Hive客户端执行一些选择查询,它工作得很好。但是当我尝试执行相同的计数查询时,Mapreduce作业无法完成。它看起来像再次重新启动作业。如何解决此问题?
法典
Connection con = DriverManager.getConnection("jdbc:hive2://abc:10000/default","", "");
Statement stmt = con.createStatement();
String query = "select count(*) from X_Table
ResultSet res = stmt.executeQuery(query);
while (res.next()) {
//code here
}
日志详细信息:
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1429243611915_0030, Tracking URL = http://master:8088/proxy/application_1429243611915_0030/
Kill Command = /usr/local/pcs/hadoop/bin/hadoop job -kill job_1429243611915_0030
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-04-20 09:28:02,616 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:29:02,728 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:30:03,432 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:31:04,054 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:32:04,675 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:33:05,298 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:34:05,866 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:35:06,419 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:36:06,985 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:37:07,551 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:38:08,289 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:39:09,184 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:40:09,780 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:41:10,367 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:42:10,965 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:43:11,595 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:44:12,181 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:45:12,952 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:46:13,590 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:47:14,218 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:48:14,790 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:49:15,378 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:50:16,014 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:51:16,808 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:52:17,378 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:53:17,928 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:54:18,491 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:55:19,049 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:56:19,797 Stage-1 map = 0%, reduce = 0%
2015-04-20 09:57:20,344 Stage-1 map = 0%, reduce = 0%
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1429243611915_0031, Tracking URL = http://master:8088/proxy/application_1429243611915_0031/
Kill Command = /usr/local/pcs/hadoop/bin/hadoop job -kill job_1429243611915_0031
2015-04-20 09:58:20,858 Stage-1 map = 0%, reduce = 0%
如果在yarn-site.xml
文件中增加这两个配置的内存,那么它将快速运行。
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
上面的答案有效,它真的对我有很大帮助。我试图在 HIVE 中运行一个简单的 count(*) 查询,但随后它既不会出错也不会完成。它将一直挂在那里,直到我在命令提示符下终止作业。我完全疯了,我没有从谷歌得到适当的答案。但是上面的答案对我帮助很大。所以我们需要增加记忆
-
yarn.scheduler.maximum-allocation-mb
-
yarn.nodemanager.resource.memory-mb
这可以在Yarn-Site.xml
中完成,甚至可以在Yarn Service下的Cloudera Manager中完成。增加内存后,重新启动所有过时的服务。这将解决问题。