根据我对spark-sql的调查,我们发现两个以上的表不能直接联接,我们必须使用子查询才能使其工作。所以我使用子查询,并能够连接3个表:
带有以下查询:
"SELECT name,age,gender,dpi.msisdn,subscriptionType,maritalStatus,isHighARPU,ipAddress,startTime,endTime,isRoaming,dpi.totalCount,dpi.website FROM(选择子名称,子页面,subscript.gender、subscript.msisdn、subscript.subscriptionType,subscript.maritalStatus,subscript.isHighARPU,cdr.ipAddress,cdr.startTime,cdr.endTime,cdr.isRoaming FROM SUBSCRIBER_META subsc,cdr_FACT cdr其中subscript.msisdn=cdr.msisdn AND cdr.isRoaming='Y')temp,DPI_FACT DPI其中temp.msidn=DPI.msisdn";
但在相同的模式下,我试图加入4个表,这让我出现了以下异常
java.lang.RuntimeException:[1.517]失败:需要标识符
查询加入4个表:
SELECT name,dueAmount FROM(SELECT name,age,gender,dpi.msisdn,subscriptionType,maritalStatus,isHighARPU,ipAddress,startTime,endTime,isRoaming,dpi.totalCount,dpi.website FROM(选择subscript.name,subscript.age,subscript.gender,subscript.msisdn,subscription.subscriptionType,subscript.maritalStatus,subscript.isHighARPU,cdr.ipAddress、cdr.startTime、cdr.endTime、cdr.isRoaming FROMSUBSCRIBER_META subsc,CDR_FACT CDR WHERE subsc.msidn=CDR.msisdnAND cdr.isRoaming='Y')温度,DPI_FACT DPI其中temp.msisdn=dpi.msisdn)内部,BILLING_META计费,其中inner.msisdn=计费系统
有人能帮我完成这个查询吗?
提前谢谢。错误如下:
09/02/2015 02:55:24 [ERROR] org.apache.spark.Logging$class: Error running job streaming job 1423479307000 ms.0
java.lang.RuntimeException: [1.517] failure: identifier expected
SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:73)
at org.apache.spark.sql.api.java.JavaSQLContext.sql(JavaSQLContext.scala:49)
at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:596)
at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:546)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
由于您在sql中使用了Spark的保留关键字"inner",因此发生异常。避免使用Spark SQL中的Keywords作为自定义标识符。