如何使用Scala在Spark Dataframe查询中使用自定义函数

我将数据从数据库加载到Spark Dataframe，名为DF，那么我必须从Dataframe中提取一些ID有特殊条件的记录。因此，我定义了这个函数:

def hash_id(id:String): Int = {
val two_char = id.takeRight(2).toInt
val hash_result = two_char % 4
return hash_result
}

然后，我在这个查询中使用函数:

DF.filter(hash_id("ID")===3)

但是我收到这个错误:

value === is not a member of Int

DF有ID列。

您能指导我如何使用where/filter条款中的自定义函数吗?

任何帮助都将非常感激。

===只能在Column对象之间使用。这就是为什么您有一个错误value === is not a member of Int，因为您的函数hash_id的返回类型是Int，而不是Column

为了能够使用您的函数，您应该将其转换为用户定义的函数，并将该函数应用于列对象，如下所示:

import org.apache.spark.sql.functions.{col, udf}
def hash_id(id:String): Int = {
val two_char = id.takeRight(2).toInt
val hash_result = two_char % 4
return hash_result
}
val hash_id_udf = udf((id: String) => hasd_id(id))
DF.filter(hash_id_udf(col("ID")) === 3)

相关内容

最新更新

热门标签：