我有一个函数,它只考虑营业时间返回sla值(用于请求进入时),
def processDifference(enter_time, exit_time, key):
...logic is here ...
return sla_time
this works fine
现在我在databricks中有一个表,它有case_id,enter_time, exit_time, key,所以我想在已经存在的表中添加新列(sla_time),并能够从我的python函数
传递值根据您的问题,您的DataFrame (df)可能是
+-------+-------------------+-------------------+-----+
|case_id|enter_time |exit_time |key |
+-------+-------------------+-------------------+-----+
|1 |2023-04-07 10:00:00|2023-04-07 11:00:00|a_key|
+-------+-------------------+-------------------+-----+
导入必要的包
from pyspark.sql.functions import udf
为了在Spark中使用您的Python function
,我们需要将其注册为UDF
并与DataFrame列一起使用。
def processDifference(enter_time, exit_time, key):
return "some_sla_time"
processDifferenceUdf = udf(processDifference) # <-- Registering function as UDF
df
.withColumn("sla_time", processDifferenceUdf("enter_time","exit_time","key"))
.show(truncate=False)
输出+-------+-------------------+-------------------+-----+-------------+
|case_id|enter_time |exit_time |key |sla_time |
+-------+-------------------+-------------------+-----+-------------+
|1 |2023-04-07 10:00:00|2023-04-07 11:00:00|a_key|some_sla_time|
+-------+-------------------+-------------------+-----+-------------+