并行运行SparkSQL阶段作业



我正在加载一个文本文件:

val adReqRDD = sc.textFile("/Users/itru/Desktop/vastrack_sample_old.rtf")

,我将数据存储为temptable

adReqRDD.registerTempTable("adreqdata")

我需要查询上面的表

val alladreq = sqlContext.sql("select DeviceId,count(EventType) as AllAdreqCount from adreqdata where EventType = 1 and Network = 0 group by DeviceId ")
val adreqPerDeviceid = sqlContext.sql("select DeviceId,count(EventType) as AdreqCount from adreqdata where EventType = 1 and Network = 0 and PlacementId <> '-' and BundleID <> '-' and DeviceId <> '-' and IPAddress <> '-' group by DeviceId ")
val adreqPerDeviceidtoSpotx = sqlContext.sql("select DeviceId,count(EventType) as AdreqCountToSpotx from adreqdata where EventType = 1 and Network = 9 and PlacementId <> '-' and BundleID <> '-' and DeviceId <> '-' and IPAddress <> '-' group by DeviceId ")

当我的工作开始时,所有3个活动阶段都按顺序运行,我如何使它们并行运行

您可以使用期货来并行启动火花操作。就像这样。

val queries = Seq(
  "query1",
  "query2",
  "query3"
)
val results = Future.traverse(queries)(q => Future({
  val queryResult = sqlContext.sql(q)
  queryResult.write.format...
}))
Await.result(result, Duration.Inf)

相关内容

  • 没有找到相关文章

最新更新