需要将我的python函数值插入到数据库块中的现有表中



我有一个函数,它只考虑营业时间返回sla值(用于请求进入时),

def processDifference(enter_time, exit_time,  key):
...logic is here ...
return sla_time

this works fine

现在我在databricks中有一个表,它有case_id,enter_time, exit_time, key,所以我想在已经存在的表中添加新列(sla_time),并能够从我的python函数

传递值

根据您的问题,您的DataFrame (df)可能是

+-------+-------------------+-------------------+-----+
|case_id|enter_time         |exit_time          |key  |
+-------+-------------------+-------------------+-----+
|1      |2023-04-07 10:00:00|2023-04-07 11:00:00|a_key|
+-------+-------------------+-------------------+-----+

导入必要的包

from pyspark.sql.functions import udf

为了在Spark中使用您的Python function,我们需要将其注册为UDF并与DataFrame列一起使用。

def processDifference(enter_time, exit_time, key):
return "some_sla_time"
processDifferenceUdf = udf(processDifference) # <-- Registering function as UDF
df 
.withColumn("sla_time", processDifferenceUdf("enter_time","exit_time","key")) 
.show(truncate=False)

输出
+-------+-------------------+-------------------+-----+-------------+
|case_id|enter_time         |exit_time          |key  |sla_time     |
+-------+-------------------+-------------------+-----+-------------+
|1      |2023-04-07 10:00:00|2023-04-07 11:00:00|a_key|some_sla_time|
+-------+-------------------+-------------------+-----+-------------+

最新更新