我使用RJDBC 0.2-5连接到Rstudio中的Hive。我的服务器有hadoop-2.4.1和hive-0.14。我按照下面提到的步骤连接到Hive。
library(DBI)
library(rJava)
library(RJDBC)
.jinit(parameters="-DrJava.debug=true")
drv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver",
c("/home/packages/hive/New folder3/commons-logging-1.1.3.jar",
"/home/packages/hive/New folder3/hive-jdbc-0.14.0.jar",
"/home/packages/hive/New folder3/hive-metastore-0.14.0.jar",
"/home/packages/hive/New folder3/hive-service-0.14.0.jar",
"/home/packages/hive/New folder3/libfb303-0.9.0.jar",
"/home/packages/hive/New folder3/libthrift-0.9.0.jar",
"/home/packages/hive/New folder3/log4j-1.2.16.jar",
"/home/packages/hive/New folder3/slf4j-api-1.7.5.jar",
"/home/packages/hive/New folder3/slf4j-log4j12-1.7.5.jar",
"/home/packages/hive/New folder3/hive-common-0.14.0.jar",
"/home/packages/hive/New folder3/hadoop-core-0.20.2.jar",
"/home/packages/hive/New folder3/hive-serde-0.14.0.jar",
"/home/packages/hive/New folder3/hadoop-common-2.4.1.jar"),
identifier.quote="`")
conHive <- dbConnect(drv, "jdbc:hive://myserver:10000/default",
"usr",
"pwd")
但是我总是得到以下错误:
.jcall(drv@jdrv, "Ljava/sql/Connection;", "connect",出错as.character(url)[1],: java.lang.NoClassDefFoundError: Could not初始化类org.apache.hadoop.hive.conf.HiveConf$ConfVars
甚至我尝试了不同版本的Hive jar, Hive-jdbc-standalone.jar,但似乎没有任何工作。我也用RHive连接到Hive,但也没有成功。
有谁能帮帮我吗?我有点卡住了:(我没有尝试rHive,因为它似乎需要在集群的所有节点上进行复杂的安装。
我使用RJDBC成功连接到Hive,这里是一个代码片段,在我的Hadoop 2.6 CDH5.4集群上工作:
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
更难的是找到所有需要的罐子和在哪里找到它们…
hive独立JAR包含了使用hive所需要的所有东西,使用这个独立JAR和hadoop-common JAR就足以使用hive了。
所以这是一个简化的版本,不需要担心其他jar的hadoop-common和hive-standalone。
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc-standalone.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
Ioicmathieu的答案现在为我工作后,我已经切换到旧的hive jar,例如从3.1.1到2.0.0。
不幸的是,我不能评论他的回答,这就是为什么我写了另一个。
如果您遇到以下错误,请尝试旧版本:
.jcall(drv@jdrv, "Ljava/sql/Connection;", "connect",出错as.character(url)[1],: java.sql.SQLException:无法打开客户端传输JDBC Uri:jdbc:hive2://host_name: Could not establishconnection to jdbc:hive2://host_name:10000:必选字段"client_protocol"未设置!结构:TOpenSessionReq (client_protocol:空,配置:{设置:hiveconf: hive.server2.thrift.resultset.default.fetch.size = 1000,用途:数据库=默认})