我在机器上安装了Cassandra和Spark with SparkSQL。Spark SQL支持JOIN关键字
https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkSqlSupportedSyntax.html
Spark SQL支持的语法以下语法定义SELECT查询
SELECT[DISTINCT][列名]|[通配符]FROM[kesypace姓名。]表名[JOIN子句表名ON联接条件][WHERE条件][GROUP BY列名][HAVING条件][ORDER BY列名称
我有以下代码
SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
Session session = connector.openSession();
ResultSet results;
String sql ="";
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
sql = "SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID ALLOW FILTERING;";
results = session.execute(sql);
我得到以下错误
原因:com.datatax.river.core.exceptions.SyntaxError:第1:25行在","缺少EOF(SELECT*from siem.report[,]siem…)上午11:14com.datatax.river.core.exceptions.SyntaxError:行1:25缺少EOF在","(SELECT*from siem.report[,]siem…)com.datatax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58)在com.datatax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24)在com.datatax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)在com.datatax.driver.core.DefaultResultSetFuture.getUnterruptible(DefaultResultSetFutures.java:245)在com.datatax.driver.core.AbstractSession.execute(AbstractSession.java:63)在com.datatax.driver.core.AbstractSession.execute(AbstractSession.java:39)位于的sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)位于java.lang.reflect.Method.ioke(Method.java:483)com.datatax.spark.connecter.cql.SessionProxy.invoke(SessionProxy.scala:33)网址:com.sun.proxy.$Proxy59.execute(未知来源)com.ge.prdix.rmd.siem.boot.PersistenceTest.test_QuerySparkOnReport_GIACOMO_LogDao(PersistenceTest.java:178)位于的sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)位于java.lang.reflect.Method.ioke(Method.java:483)org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)在org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)在org.junit.runners.model.FrameworkMethod.invokeExploly(FrameworkMethod.java:47)在org.junit.internal.runners.statements.InvokeMethod.eevaluate(InvokeMethod.java:17)在org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:73)在org.springframework.test.context.junit4.statements
也尝试过
SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID ALLOW FILTERING
也尝试过
SELECT * from siem.report R JOIN siem.netstat N on R.REPORTUUID = N.NETSTATREPORTUUID ALLOW FILTERING
有人能帮我吗?我真的在用SparkSQL还是CQL?
更新
我试过
public void test_JOIN_on_Cassandra () {
SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(sc);
try {
//QueryExecution test1 = sqlContext.executeSql("SELECT * from siem.report");
//QueryExecution test2 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID");
QueryExecution test3 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");
} catch (Exception e) {
e.printStackTrace();
}
// SchemaRDD results = sc.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");
}
我得到
===分析的逻辑计划==='项目[未解决的问题()]+-'加入内部,某些('siem.report.REPORTUUID='siem.netstat.NETSTEPORTUUID):-'未解析关系CCD_ 1。
report
,无+-'未解析关系siem
。netstat
,无===分析的逻辑计划===org.apache.spark.sql.catalyst.analysis.UnresolvedException:无效调用未解析对象的toAttribute,树:unresolvedialias()'项目[未解决的问题(*)]+-'加入内部,某些('siem.report.REPORTUUID='siem.netstat.NETSTEPORTUUID):-'未解析关系siem
。report
,无+-'未解析关系siem
。netstat
,无===优化的逻辑计划===org.apache.spark.sql.AnalysisException:找不到表:siem
。siem
0;===物理计划===org.apache.spark.sql.AnalysisException:找不到表:siem
。report
;
看起来您在这里混合了几个概念,这些概念正在创建一个错误。您正在创建的会话将打开一条直达Cassandra的线路,这意味着它将接受CQL而不是SQL。如果你想运行SQL,你可以做一个小的改变
SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
SchemaRDD results = sparkContext.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");
您可以从Spark上下文调用SparkSQL,而不是直接连接到Cassandra。更多信息:http://docs.datastax.com/en/latest-dse/datastax_enterprise/spark/sparkSqlJava.html