无法在 HDP 中的 Hive 查询中使用 mongo-hadoop 连接器



我是Hadoop的新手。我已经安装了 hortonworks 沙盒 2.1。我正在尝试使用 Hive UI 执行 Hive 脚本。我想访问蜂巢中的 mongo 集合。我为此使用了以下查询:

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  city STRING,
  hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id"}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');

我添加了mongo-java-driver-2.12.2.jar,mongo-hadoop-core-1.3.0.jar和mongo-hadoop-hive-1.3.0.jar作为文件资源。但是当我执行查询时,它失败并显示以下错误:

15/03/11 04:38:24 INFO exec.DDLTask: Use StorageHandler-supplied com.mongodb.hadoop.hive.BSONSerDe for table individuals
15/03/11 04:38:24 ERROR exec.DDLTask: java.lang.NoClassDefFoundError: com/mongodb/util/JSON
    at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:107)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:283)
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:276)
    at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:626)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:593)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:349)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:614)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:603)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:356)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1537)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1.run(BeeswaxServiceImpl.java:603)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

有人可以帮忙让我知道我在这里错过了什么吗?

提前谢谢。

你需要映射你的mongodb集合中的所有项目,而不仅仅是"_id":

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  city STRING,
  hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"<corresponding name in your collection>", "age":"<same here>", etc...}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');

最新更新