从列表[列表[字符串]]中删除子集的有效方法是什么?



我有一个List[String]的ListBuffer,val tList = ListBuffer[TCount]TCount在哪里case class TCount(l: List[String], c: Long)。我想从tList中找到那些列表l,这些列表不是任何其他tlist元素的子集,并且它们的c值小于它们的超集c值。以下程序有效,但我必须使用两个 for 循环,这使得代码效率低下。有没有更好的方法来使代码高效?

val _arr = tList.toArray
for (i <- 0 to (_arr.length - 1)) {
val il = _arr(i).l.toSet
val ic = _arr(i).c
for (j <- 0 to (_arr.length - 1)) {
val jl = _arr(j).toSet
val jc = _arr(j).c
if (i != j && il.subsetOf(jl) && ic >= jc) { 
tList.-=(_arr(i))
}
}
}

灵感来自设置评论:

import scala.collection.SortedMap
class SetTrie[A](val flag: Boolean, val children: SortedMap[A, SetTrie[A]])(implicit val ord: Ordering[A]) {
def insert(xs: List[A]): SetTrie[A] = xs match {
case Nil => new SetTrie(true, children)
case a :: rest => {
val current = children.getOrElse(a, new SetTrie[A](false, SortedMap.empty))
val inserted = current.insert(rest)
new SetTrie(flag, children + (a -> inserted))
}
}
def containsSuperset(xs: List[A], strict: Boolean): Boolean = xs match {
case Nil => !children.isEmpty || (!strict && flag)
case a :: rest => {
children.get(a).map(_.containsSuperset(rest, strict)).getOrElse(false) ||
children.takeWhile(x => ord.lt(x._1, a)).exists(_._2.containsSuperset(xs, false))
}
}
}
def removeSubsets[A : Ordering](xss: List[List[A]]): List[List[A]] = {
val sorted = xss.map(_.sorted)
val setTrie = sorted.foldLeft(new SetTrie[A](false, SortedMap.empty)) { case (st, xs) => st.insert(xs) }
sorted.filterNot(xs => setTrie.containsSuperset(xs, true))
}

这是一个依赖于与 Set-Trie 有点相似的数据结构的方法,但它显式存储了更多的子集。它提供的压缩效果更差,但在查找过程中速度更快:

def findMaximal(lists: List[List[String]]): List[List[String]] = {
import collection.mutable.HashMap
class Node(
var isSubset: Boolean = false, 
val children: HashMap[String, Node] = HashMap.empty
) {
def insert(xs: List[String], isSubs: Boolean): Unit = if (xs.isEmpty) {
isSubset |= isSubs
} else {
var isSubsSubs = false || isSubs
for (h :: t <- xs.tails) {
children.getOrElseUpdate(h, new Node()).insert(t, isSubsSubs)
isSubsSubs = true
}
}
def isMaximal(xs: List[String]): Boolean = xs match {
case Nil => children.isEmpty && !isSubset
case h :: t => children(h).isMaximal(t)
}
override def toString: String = {
if (children.isEmpty) "#"
else children.flatMap{ 
case (k,v) => {
if (v.children.isEmpty) List(k)
else (k + ":") :: v.toString.split("n").map("  " + _).toList
}
}.mkString("n")
}
}
val listsWithSorted = for (x <- lists) yield (x, x.sorted)
val root = new Node()
for ((x, s) <- listsWithSorted) root.insert(s, false)
// println(root)
for ((x, s) <- listsWithSorted; if root.isMaximal(s)) yield x
}

请注意,我被允许在方法主体内做任何类型的可变废话,因为可变的 trie 数据结构永远不会逃脱方法的范围,因此不能无意中与另一个线程共享。

下面是一个包含字符集(转换为字符串列表(的示例:

println(findMaximal(List(
"ab", "abc", "ac", "abd",
"ade", "efd", "adf", "bafd",
"abd", "fda", "dba", "dbe"
).map(_.toList.map(_.toString))))

输出为:

List(
List(a, b, c), 
List(a, d, e), 
List(e, f, d), 
List(b, a, f, d), 
List(d, b, e)
)

事实上,非极大元素abacabdadffdadba都被消除了。

这是我的不完全设置trie数据结构的样子(子节点缩进(:

e:
f
b:
e
d:
e
f
c
f
d:
e:
f
f
a:
e
b:
d:
f
c
f
d:
e
f
c
f
c
f

不确定您是否可以避免复杂性,但是,我想我会这样写:

val tList = List(List(1, 2, 3), List(3, 2, 1), List(9, 4, 7), List(3, 5, 6), List(1, 5, 6), List(6, 1, 5))
val tSet = tList.map(_.toSet)
def result = tSet.filterNot { sub => tSet.count(_.subsetOf(sub)) > 1 }

这里有一种方法:

  1. 创建用于标识原始列表元素的indexed Map
  2. 将列表元素的映射转换为Map of Sets(带索引(
  3. 生成 Map 元素的combinations,并使用自定义过滤器捕获其他元素subset的元素
  4. 从集合映射中删除这些subset元素,并通过索引从列表映射中检索剩余元素

示例代码:

type TupIntSet = Tuple2[Int, Set[Int]]
def subsetFilter(ls: List[TupIntSet]): List[TupIntSet] = 
if ( ls.size != 2 ) List.empty[TupIntSet] else
if ( ls(0)._2 subsetOf ls(1)._2 ) List[TupIntSet]((ls(0)._1, ls(0)._2)) else
if ( ls(1)._2 subsetOf ls(0)._2 ) List[TupIntSet]((ls(1)._1, ls(1)._2)) else
List.empty[TupIntSet]
val tList = List(List(1,2), List(1,2,3), List(3,4,5), List(5,4,3), List(2,3,4), List(6,7))
val listMap = (Stream from 1).zip(tList).toMap
val setMap = listMap.map{ case (i, l) => (i, l.toSet) }
val tSubsets = setMap.toList.combinations(2).toSet.flatMap(subsetFilter)
val resultList = (setMap.toSet -- tSubsets).map(_._1).map(listMap.getOrElse(_, ""))
// resultList: scala.collection.immutable.Set[java.io.Serializable] =
//   Set(List(5, 4, 3), List(2, 3, 4), List(6, 7), List(1, 2, 3))

最新更新