Spark 找不到窗口函数



使用https://stackoverflow.com/a/32407543/5379015提供的解决方案我试图重新创建相同的查询,但使用编程语法代替Dataframe API如下:

import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
object HiveContextTest {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("HiveContextTest")
    val sc = new SparkContext(conf)
    val sqlContext = new HiveContext(sc)
    import sqlContext.implicits._
    val df = sc.parallelize(
      ("foo", 1) :: ("foo", 2) :: ("bar", 1) :: ("bar", 2) :: Nil
    ).toDF("k", "v")

    // using dataframe api works fine
    val w = Window.partitionBy($"k").orderBy($"v")
    df.select($"k",$"v", rowNumber().over(w).alias("rn")).show

    //using programmatic syntax doesn't work
    df.registerTempTable("df")
    val w2 = sqlContext.sql("select k,v,rowNumber() over (partition by k order by v) as rn from df")
    w2.show()
  }
}

第一个df.select($"k",$"v", rowNumber().over(w).alias("rn")).show工作正常,但w2.show()导致

Exception in thread "main" org.apache.spark.sql.AnalysisException: Couldn't find window function rowNumber;

有没有人有任何想法,我如何才能使这个工作与编程语法?

SQL等价于rowNumberrow_number:

SELECT k, v, row_number() OVER (PARTITION BY k ORDER BY v) AS rn FROM df

相关内容

  • 没有找到相关文章

最新更新