如何在没有数据集到 rdd 转换的情况下做到这一点?



有人可以帮我如何避免rdd转换吗?

val qksDistribution: Array[((String, Int), Long)] = tripDataset
.map(i => ((i.getFirstPoint.getQk.substring(0, QK_PARTITION_LEVEL), i.getProviderId), 1L))
.rdd
.reduceByKey(_+_)
.filter(_._2>maxCountInPartition/10)
.collect
val qksDistribution: Array[((String, Int), Long)] = tripDataset
.map(i => (i.getFirstPoint.getQk.substring(0, QK_PARTITION_LEVEL), i.getProviderId)) // no need to add the 1
.groupByKey(x => x) //similar to key by
.count // you wanted to count per key
.filter(_._2>maxCountInPartition/10)
.collect

相关内容

最新更新