我们使用JDBC API通过Eclipse程序连接到HIVE来访问hive表,下面是代码:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.Statement;
import org.testng.annotations.Test;
public class FetchHiveData_test {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
@Test
public void FetchHiveDataMethod() {
ResultSet hiveres=null;
try {
System.out.println("Inside Hive Method");
Class.forName(driverName);
/*********** Hive ***************/
Connection con = DriverManager.getConnection("jdbc:hive2://XXXXX:20000", "xxxxxx", "yyyyy");
Statement stmt = con.createStatement();
String sql="select count(*) from table";
hiveres = stmt.executeQuery(sql);
ResultSetMetaData rsmd = hiveres.getMetaData();
int numCols = rsmd.getColumnCount();
for(int j=1;j<=numCols;j++){
System.out.print(rsmd.getColumnName(j)+" ");
}
while (hiveres.next()) {
//Print one row
for(int i = 1 ; i <= numCols; i++){
System.out.print(hiveres.getString(i) + " "); //Print one element of a row
}
System.out.println();//Move to the next line to print the next row.
}
} catch (Exception e) {
e.printStackTrace();
}
}}
这里的方法利用了 spark 上下文,但不确定它在哪里获取服务器名称和凭据,我如何修改上述程序以利用 spark?这是为了使我们的查询运行得更快,因为 JDBC API 有点慢。
火花代码:
import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.api.java.JavaSchemaRDD;
import org.apache.spark.sql.api.java.Row;
import org.apache.spark.sql.hive.api.java.JavaHiveContext;
import org.apache.spark.rdd.*;
import org.testng.annotations.Test;
public class SparkTest {
@SuppressWarnings("serial")
@Test
public void f() {
final SparkConf sparkConf = new SparkConf().setMaster("xxxxx:20000").setAppName("HiveConnector");
final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
JavaHiveContext hiveCtx = new JavaHiveContext(sparkContext);
JavaSchemaRDD rdd = hiveCtx.sql("Select count(*) from table");
JavaRDD<Integer> keys = rdd.map(new Function<Row, Integer>() {
public Integer call(Row row) { return row.getInt(0); }
});
List<Integer> res= keys.collect();
for(Integer val:res){
System.out.println("val "+val);
}
}
}
尝试跑步http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server,您将能够像在 Hive 中一样通过 JDBC 访问它。
注意:这可能不支持所有 hiveQl。