有没有什么方法可以在不使用for循环的情况下获得列表中的每一项



有没有任何方法可以在不使用for循环的情况下获得列表中的每一项?

即给定数据

+----+---------+-------------+----------+-----------+
|  id|     date|   revenue   |con_dist_1| con_dist_2|
+----+---------+-------------+----------+-----------+
|3310|1/15/2018|  0.010680705|         6|0.019875458|
|3310|1/15/2018|  0.006628853|         4|0.816039063|
|3310|1/15/2018|   0.01378215|         4|0.082049528|
|3310|1/15/2018|  0.010680705|         6|0.019875458|
|3310|1/15/2018|  0.006628853|         4|0.816039063|
|3310|1/15/2018|   0.01378215|         4|0.082049528|
|3310|1/15/2018|  0.010680705|         6|0.019875458|
|3310|1/15/2018|  0.010680705|         6|0.019875458|
|3310|1/15/2018|  0.014933087|         5|0.034681906|
|3310|1/15/2018|  0.014448282|         3|0.082049528|
+----+---------+-------------+----------+-----------+
val col_list = Array("con_dist_1","con_dist_2")
val median_col_list = partitioned_data.stat.approxQuantile(col_list, Array(0.0,0.1,0.5),0.0)

val percentile_0 = 0;
val percentile_10 = 1;

val Q0 = median_col_list(col_list.indexOf("con_dist_1"))(percentile_0)
val Q10 =median_col_list(col_list.indexOf("con_dist_1"))(percentile_10)

如果不循环col_list,有什么方法可以计算percentile_0&对于col_list中的每个项目,percentile_10,我的意思是并行。。。使用地图什么的???

我将回答"如何为多列一次计算多个(近似(百分位数"?

根据DataFrameStatFunctions文档,签名

approxQuantile(cols: Array[String], probabilities: Array[Double], relativeError: Double): Array[Array[Double]]

自2.2.0起(仅(可用。

如果你使用的是旧版本的Spark,它就不会有这个签名,而且做这个计算也不会那么容易。

这是一个使用Spark 2.4.0的数据示例。

val df = Seq((3310,"1/15/2018",0.010680705,6,0.019875458),(3310,"1/15/2018",0.006628853,4,0.816039063),(3310,"1/15/2018",0.01378215,4,0.082049528),(3310,"1/15/2018",0.010680705,6,0.019875458),(3310,"1/15/2018",0.006628853,4,0.816039063),(3310,"1/15/2018",0.01378215,4,0.082049528),(3310,"1/15/2018",0.010680705,6,0.019875458),(3310,"1/15/2018",0.010680705,6,0.019875458),(3310,"1/15/2018",0.014933087,5,0.034681906),(3310,"1/15/2018",0.014448282,3,0.082049528)).toDF("id","date","revenue","con_dist_1","con_dist_2")
df.stat.approxQuantile(Array("con_dist_1", "con_dist_2"), Array(0.1, 0.5), 0)

输出(第一个维度是列,第二个维度是请求的百分比,因此例如,con_dist_1的第10个百分比是3.0(:

Array[Array[Double]] = Array(Array(3.0, 4.0), Array(0.019875458, 0.034681906))

相关内容

最新更新