我是spark和scala的新手,我正在尝试
我有(presentation,CompactBuffer(3,3,24,24,24,24,28,28(形式的RDD我正在尝试转换为(演示,列表((3,2(,(24,5(,(28,3((
我能够转换为形式(字符串,Iterable[string]((演示文稿,列表((3,1(、(3,1、(、(24,1(、(24:1(、。
如何将它们分组为(3,2(、(24,3(
''' val RDD4 = RDD3.map {
case (key, values) =>
val v = values.map(word => (word, 1))
(key, v)
}'''
你可以得到这样的:
List((3,1), (3,1), (24,1), (24,1), (24,1), (24,1), (24,1), (28,1), (28,1), (28,1))
.groupBy{case (key, _) => key}
.mapValues(
valuesWithSameKeyList => valuesWithSameKeyList
.map{
case (_, value) => value
}.sum
)