让我们看看下面的代码源代码:
def foo(s1: Set[Int], s2: Set[Int], s3: Set[Int]): Set[Set[Int]] = {
for {
ss1 <- s1
ss2 <- s2
ss3 <- s3
} yield Set(ss1, ss2, ss3)
}
如何定义def foo(ss: Set[Int]*)
的类似函数?
它几乎与通常的笛卡尔积相同,除了您必须将所有结果塞入集合中,而不是将它们收集到有序的元组中:
/** Forms cartesian product of sets,
* then collapses each resulting tuple into a set.
*/
def collapsedCartesian[A](sets: Set[A]*): Set[Set[A]] = sets match
case Seq() => Set(Set.empty)
case Seq(h, t @ _*) => for a <- h; b <- collapsedCartesian(t: _*) yield (b + a)
注意,这里+
向集合set + elem
添加了一个元素,这是一个奇怪的不对称操作,用这样一个对称符号来表示。
结果似乎合理不规则:
collapsedCartesian(Set(1, 2), Set(3, 4)).foreach(println)
println("---")
collapsedCartesian(Set(1, 2), Set(1, 2)).foreach(println)
println("---")
collapsedCartesian(Set(1, 2, 3), Set(4, 5), Set(6, 7)).foreach(println)
println("---")
collapsedCartesian(Set(1, 2, 3), Set(2, 3, 4), Set(4, 5)).foreach(println)
给:
Set(3, 1)
Set(4, 1)
Set(3, 2)
Set(4, 2)
---
Set(1)
Set(2, 1)
Set(2)
---
Set(7, 5, 1)
Set(6, 4, 2)
Set(6, 4, 1)
Set(7, 4, 1)
Set(6, 5, 1)
Set(7, 5, 3)
Set(7, 4, 2)
Set(6, 5, 2)
Set(6, 4, 3)
Set(7, 5, 2)
Set(7, 4, 3)
Set(6, 5, 3)
---
Set(5, 3, 1)
Set(5, 4, 2)
Set(5, 4, 1)
Set(4, 2)
Set(4, 1)
Set(5, 3)
Set(5, 3, 2)
Set(5, 4, 3)
Set(4, 2, 1)
Set(5, 2, 1)
Set(4, 3, 1)
Set(4, 3, 2)
Set(5, 2)
Set(4, 3)
请不要问如何在Spark中做到这一点,这种指数级爆炸的东西显然对任何数据集来说都是无用的。