在火花流之上执行 spark sql 时,我遇到了一个问题。
我没有在行上打印 x 的值 var x = sqlContext.sql("从价格中选择计数(*) ")
请在下面找到我的代码
import spark.implicits._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Encoders
import org.apache.spark.streaming._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
import spark.implicits._
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
import spark.implicits._
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel
import java.util.regex.Pattern
import java.util.regex.Matcher
import org.apache.spark.sql.hive.HiveContext;
import org.apache.spark.sql.streaming.Trigger
import org.apache.spark.sql._
val conf = new SparkConf().setAppName("streamHive").setMaster("local[*]").set("spark.driver.allowMultipleContexts", "true")
val ssc = new StreamingContext(conf, Seconds(5))
val sc=ssc.sparkContext
val lines = ssc.textFileStream("file:///home/sdf/testHive")
case class Prices(name: String, age: String,sex: String, location: String)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
def parse (rdd : org.apache.spark.rdd.RDD[String] ) = {
var l = rdd.map(_.split(","))
val prices = l.map(p => Prices(p(0),p(1),p(2),p(3)))
val pricesDf = sqlContext.createDataFrame(prices)
pricesDf.registerTempTable("prices")
println("showing printdfShow")
pricesDf.show()
var x = sqlContext.sql("select count(*) from prices")
println("hello")
println (x)
}
lines.foreachRDD { rdd => parse(rdd)}
ssc.start()
我得到以下结果,它没有打印火花sql结果:
[count(1): bigint]
showing printdfShow
+----+---+---+--------+
|name|age|sex|location|
+----+---+---+--------+
+----+---+---+--------+
hello
[count(1): bigint]
showing printdfShow
+----+---+---+--------+
|name|age|sex|location|
+----+---+---+--------+
| rop| 22| M| uk|
| fop| 24| F| us|
| dop| 23| M| fok|
+----+---+---+--------+
hello
[count(1): bigint]
showing printdfShow
+----+---+---+--------+
|name|age|sex|location|
+----+---+---+--------+
+----+---+---+--------+
hello
[count(1): bigint]
请帮助我,如何在Spark流中使用Spark SQL,因为我是Spark的新手。
请在价格Df.show之后在您的代码中尝试此操作
println(pricesDf.count)
如果你想在同一个代码中这样做,那么试试下面,而不是println(x)
x.show
x 是一个数据帧而不是一个值,这就是为什么在运行 println(x) 时它没有被打印的原因。要在变量中获取它,您可以尝试这样做
println(x.rdd.map(r => r.getString(0)).collect()(0))