前缀范围输出格式



我正在尝试运行以下示例代码:

import org.apache.spark.mllib.fpm.PrefixSpan
val sequences = sc.parallelize(Seq(
Array(Array(1, 2), Array(3)),
Array(Array(1), Array(3, 2), Array(1, 2)),
Array(Array(1, 2), Array(5)),
Array(Array(6))
), 2).cache()
val prefixSpan = new PrefixSpan()
.setMinSupport(0.5)
.setMaxPatternLength(5)
val model = prefixSpan.run(sequences)
model.freqSequences.collect().foreach { freqSequence =>
println(
freqSequence.sequence.map(_.mkString("[", ", ", "]")).mkString("[", ", ", "]") +
", " + freqSequence.freq
)
}

我需要将 model.freqSequences 格式化为类似于以下内容的内容(它是一个具有序列和频率的数据帧(

|[WrappedArray(2,3)] |  3
|[WrappedArray(1)]   |  2
|[WrappedArray(2,1)] |  1

freqSequence.sequence上使用flatten并应用toDF应该会给出您想要的输出

model.freqSequences.map(freqSequence => (freqSequence.sequence.flatten, freqSequence.freq)).toDF("array", "freq").show(false)

应该给你

+------+----+
|array |freq|
+------+----+
|[2]   |3   |
|[3]   |2   |
|[1]   |3   |
|[2, 1]|3   |
|[1, 3]|2   |
+------+----+

我希望答案对您有所帮助

最新更新