我有一个包含此类数据的数据帧:
unit,sensitivity currency,trading desk ,portfolio ,issuer ,bucket ,underlying ,delta ,converted sensitivity
ES ,USD ,EQ DERIVATIVES,ESEQRED_LH_MIDX ,5GOY ,5 ,repo ,0.00002 ,0.00002
ES ,USD ,EQ DERIVATIVES,IND_GLOBAL1 ,no_localizado ,8 ,repo ,-0.16962 ,-0.15198
ES ,EUR ,EQ DERIVATIVES,ESEQ_UKFLOWN ,IGN2 ,8 ,repo ,-0.00253 ,-0.00253
ES ,USD ,EQ DERIVATIVES,BASKETS1 ,9YFV ,5 ,spot ,-1003.64501 ,-899.24586
我必须对这些数据进行聚合操作,执行以下操作:
val filteredDF = myDF.filter("unit = 'ES' AND `trading desk` = 'EQ DERIVATIVES' AND issuer = '5GOY' AND bucket = 5 AND underlying = 'repo' AND portfolio ='ESEQRED_LH_MIDX'")
.groupBy("unit","trading desk","portfolio","issuer","bucket","underlying")
.agg(sum("converted_sensitivity"))
但是我看到我正在失去聚合总和的精度,那么在对新的聚合列执行总和操作之前,我如何确定"converted_sensitivity"的每个值都转换为 BigDecimal(25,5(?
谢谢。
为了确保转换,您可以使用数据帧中的DecimalType
。
根据Spark文档,DecimalType
是:
表示 java.math.BigDecimal 值的数据类型。必须具有固定精度(最大位数(和小数位数(点右侧的位数(的小数。 精度最高可达 38,小数位数也可以达到 38(小于或等于精度(。 默认精度和小数位数为 (10, 0(。
你可以在这里看到这个。
要转换数据,您可以使用 Column
对象的函数cast
。喜欢这个:
import org.apache.spark.sql.types.DecimalType
val filteredDF = myDF.filter("unit = 'ES' AND `trading desk` = 'EQ DERIVATIVES' AND issuer = '5GOY' AND bucket = 5 AND underlying = 'repo' AND portfolio ='ESEQRED_LH_MIDX'")
.withColumn("new_column_big_decimal", col("converted_sensitivity").cast(DecimalType(25,5))
.groupBy("unit","trading desk","portfolio","issuer","bucket","underlying")
.agg(sum("new_column_big_decimal"))