小贝子编程

我应该在哪里注册Spark UDF的通用功能

本文关键字：功能 UDF 在哪里注册 Spark 我应该 apache-spark apache-spark-sql user-defined-functions
更新时间 : 2023-09-10
英文 : Where should I register the spark UDF for a generic function

i具有一个通用函数，该功能多次执行不同的参数。此方法使用UDF来操纵日期和年。在称为的方法中注册UDF是一个好习惯吗？如果不是，哪种是最佳实践？一次又一次注册同一UDF的性能是什么？

def get_date_from_year_and_month(year_month):
    """Returns year and month in the format YYYY-MM.
    year, month = year_month
    return str(year) + '-' + str(month).zfill(2)
def function_that_uses_udf(param):
    # Should this be done outside the function?
    get_date_from_year_and_month_udf = F.udf(get_date_from_year_and_month)
    df = df_old.withColumn(
    'date', get_date_from_year_and_month_udf(F.struct([F.col('year'), F.col('month')]))

例如，在这种情况下，每次都通过火花上下文：

def squared(s):
   return s * s
spark.udf.register("squaredWithPython", squared)

不喜欢可以将其存储在数据库中的位置。

我应该在哪里注册Spark UDF的通用功能

相关内容

最新更新

热门标签：