使用 'Order by' 或 'Group by' 时查询 Hive 表时出错



我最近安装了DBvisualizier来查询Hive表。我把它安装在我的Mac上,并从这个网站下载/安装了Hive的jdbc jar文件:https://s3.amazonaws.com/public-repo-1.hortonworks.com/HDP/hive-jdbc4/1.0.42.1054/Simba_HiveJDBC41_1.0.42.1054.zip

当我连接到我们的数据库并测试查询时。一个简单的选择就可以了:

select *
from table_name
limit 10

但是当我添加"排序依据"或"分组依据"时:

select *
from table_name
order by rollingtime
limit 10

我得到了以下错误,我不知道为什么。有没有人遇到类似的错误并知道如何解决这个问题?

09:56:17 START Executing for: 'NewDev' [Hive], Database: Hive, Schema: sdc
09:56:17 FAILED [SELECT - 0 rows, 0.504 secs] [Code: 500051, SQL State: HY000] [Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 2, SQL state: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1516123265840_0008_8_00, diagnostics=[Task failed, taskId=task_1516123265840_0008_8_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.NoClassDefFoundError: Could not initialize class org.apache.tez.runtime.library.api.TezRuntimeConfiguration
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:107)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
, errorMessage=Cannot recover from this error:java.lang.NoClassDefFoundError: Could not initialize class org.apache.tez.runtime.library.api.TezRuntimeConfiguration
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:107)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1516123265840_0008_8_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1516123265840_0008_8_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1516123265840_0008_8_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1, Query: select *
from nomura_qa_mblock_capacity_stage
order by rollingtime
limit 10.  
select *
from nomura_qa_mblock_capacity_stage
order by rollingtime
limit 10;
09:56:17 END Execution 1 statement(s) executed, 0 row(s) affected, exec/fetch time: 0.504/0.000 secs [0 successful, 1 errors]

这有时是由于无法写入配置或硬编码的.staging目录引起的,通常是由于路径不正确或权限无效。

mapreduce exec 引擎比 tez 引擎更详细,有助于识别罪魁祸首,您可以通过在 Hive shell 中运行此查询来选择:

SET hive.execution.engine=mr

然后,您可能会看到以下错误:

权限被拒绝:用户=数据库用户,访问=写入, inode="/user/dbuser/.staging":hdfs:hdfs:drwxr-xr-x

在这种情况下,"dbuser"暂存目录被指定为不存在的路径,它应该被/home/dbuser/.staging

在运行时,在执行任何执行需要暂存的操作(排序、排序、分组、分发等)的查询以及将 exec 引擎设置为mr(如前所示)之前,您需要通过运行以下查询将暂存路径设置为有效的父目录,例如用户的主目录目录:

SET yarn.app.mapreduce.am.staging-dir=/home/dbuser/.staging

根据版本和环境,如果该指令不起作用,您可以尝试

SET hive.exec.stagingdir=/home/dbuser/.staging

当然,将"dbuser"更改为您的实际主目录(或您具有读/写访问权限的任何其他目录)。将自动创建 .staging 目录(假定运行查询的用户具有写入访问权限)。

更多信息请访问 http://doc.mapr.com/display/MapR/Default+mapred+Parameters

相关内容

最新更新