Spark Dataframe -使用用户定义函数添加列

我还处于python的学习阶段。在下面的示例中(取自本文的方法3)，用户定义函数(UDF)的名称是Total(...,...)。但作者将其命名为new_f(...,...)。

在下面的代码中，我们如何知道函数调用new_f(...,...)应该调用函数Total(...,...)?如果还有另一个UDF函数，比如Sum(...,...)，该怎么办?在这种情况下，代码如何知道调用new_f(...,...)是否意味着调用Total(...,...)或Sum(...,...)?

# import the functions as F from pyspark.sql
import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType

# define the sum_col
def Total(Course_Fees, Discount):
res = Course_Fees - Discount
return res

# integer datatype is defined
new_f = F.udf(Total, IntegerType())

# calling and creating the new
# col as udf_method_sum
new_df = df.withColumn(
"Total_price", new_f("Course_Fees", "Discount"))

# Showing the Dataframe
new_df.show()

new_f = F.udf(Total, IntegerType())

将名称new_f赋值给用户定义的函数

相关内容

最新更新

热门标签：