SparkSQL - 如何重用以前选择的值



我需要第一个UDF(GetOtherTriggers(的值作为第二个UDF(GetTriggerType(的参数。

以下代码不起作用:

val df = sql.sql(
  "select GetOtherTriggers(categories) as other_triggers, GetTriggerType(other_triggers) from my_table")

返回以下异常:org.apache.spark.sql.AnalysisException:无法解析给定输入列的"other_triggers":[my_table列];

您可以使用子查询:

val df = sql.sql("""select GetTriggerType(other_triggers), other_triggers 
                 from (
                      select GetOtherTriggers(categories) as other_triggers, *
                      from my_table
                      ) withOther """)

测试:

val df = sc.parallelize (1 to 10).map(x => (x, x*2, x*3)).toDF("nr1", "nr2", "nr3");
df.createOrReplaceTempView("nr");
spark.udf.register("x3UDF", (x: Integer) => x*3);
spark.sql("""select x3UDF(nr1x3), nr1x3, nr3 
             from (
                   select x3UDF(nr1) as nr1x3, * 
                   from nr
                  ) a """)
     .show()

给:

+----------+-----+---+
|UDF(nr1x3)|nr1x3|nr3|
+----------+-----+---+
|         9|    3|  4|
|        18|    6|  8|
|        27|    9| 12|
|        36|   12| 16|
|        45|   15| 20|
|        54|   18| 24|
|        63|   21| 28|
|        72|   24| 32|
|        81|   27| 36|
|        90|   30| 40|
+----------+-----+---+

相关内容

  • 没有找到相关文章

最新更新