r-使用lapply对嵌套循环进行矢量化

我正在尝试对拓扑排序进行矢量化，以便更快地运行
它的一部分是具有嵌套for的while。我在矢量化方面遇到了问题。该功能的思想是排序相互依赖的任务

这是我迄今为止的代码：

tsort <- function(deps) {
  nm <- names(deps)
  libs <- union(as.vector(unlist(deps)), nm)
  s <- c()
  s <- unlist(lapply(libs,function(x){
    if(!(x %in% nm)) {
      s <- c(s, x)
    }
  }))
  k <- 1
  while(k > 0) {
    k <- 0
    for(x in setdiff(nm, s)) {
      r <- c(s, x)
      if(length(setdiff(deps[[x]], r)) == 0) {
        s <- r
        k <- 1
      }
    }
  }
  if(length(s) < length(libs)) {
    v <- setdiff(libs, s)
    stop(sprintf("Unorderable items :n%s", paste("", v, sep="", collapse="n")))
  }
  s
}

这是一个相互依赖的任务列表，可以使用以下函数进行排序：

tasks <- list(
"seven" = c("eight", "nine", "ten", "seven", "five", "one", "eleven", "two"),
"one" = c("two", "one", "three", "four"),
"five" = c("two", "five", "three"),
"six" = c("eight", "nine", "three", "six", "five", "one", "two", "four"),
"twelve" = c("twelve", "two", "one", "three", "four"),
"thirteen" = c("thirteen", "two", "three"),
"fourteen" = c("fourteen", "two", "three"),
"fifteen" = c("two", "three"),
"three" = c("two", "three"),
"four" = c("two", "four"),
"eleven" = c("eight", "two"),
"ten" = c("two", "ten"),
"nine" = c())

我试图向量化的部分是：

k <- 1
while(k > 0) {
  k <- 0
  for(x in setdiff(nm, s)) {
    r <- c(s, x)
    if(length(setdiff(deps[[x]], r)) == 0) {
      s <- r
      k <- 1
    }
  }
}

我发现很难将函数的主要部分矢量化，因为我有一个for和一个while在一起

首先，查看具有topological.sort()函数的包igraph。它为处理图提供了更多的功能，并且每个需要拓扑排序的问题通常都可以用图来重新表述。

我不完全确定你的代码是否正确排序。你有两个级别的循环：内部循环遍历所有以nm为单位但不以s为单位的x。外部循环是while循环，然后再次开始该过程。

每次通过内部循环时，都会将之前的过程的结果考虑在内。这导致了一个有趣的结果：虽然"十三"、"forteen"one_answers"十五"不包含与"五"或"一"的连接，而"六"包含，但"六"在拓扑上仍然排在另一个之前。这是因为"六"是在"一"one_answers"五"之后添加的，但在同一个循环中。

这种行为——如果正确的话——不能以任何方式进行矢量化。然而，据我所知，"十三"、"forteen"one_answers"十五"应该在"六"之前而不是之后排序。

也就是说，在你感兴趣的部分上方有一个非常容易的矢量化：

   s <- unlist(lapply(libs,function(x){
     if(!(x %in% nm)) {
       s <- c(s, x)
     }
  }))

实际上只是CCD_ 7。此外，您在那里所做的分配没有意义，因为s <- c(s,x)中的s是在lapply的本地环境中创建的，对外部s没有任何影响。它所做的一切与invisible(x)完全相同。

如果你想以这样一种方式向量化，你可以执行以下操作：

循环所有尚未在解决方案中的名称，并检查它们的集合是否涵盖了解决方案中所有的内容
将此值为TRUE的名称添加到解决方案中
重复，直到所有名称都在解决方案中

你可以使用下面的代码。请注意我是如何预先分配内存空间来容纳解决方案的。这种预分配节省了相当多的内存操作。在R中增长对象，就像在代码中一样，是在浪费资源。

还要注意，由于上面解释的原因，我的代码将给出与您的代码不同的排序。

tsort2 <- function(deps) {
  nm <- names(deps)
  libs <- union(as.vector(unlist(deps)), nm)
  s <- setdiff(libs,nm) 
  #Preallocation
  out <- vector(mode(libs),length(libs))
  out[seq_along(s)] <- s
  x <- setdiff(nm,s)
  lpos <- length(s)
  # go over all x and check which ones contain all names in the 
  # current solution.
  # Add these names to the solution
  # remove these names from x
  # repeat until x is empty
  while(length(x) > 0){
    tmp <- out[seq_len(lpos)]
    id <- sapply(x, function(i){
       length(setdiff(deps[[i]],c(i,tmp))) == 0
    } )
    id <- which(id)
    lid <- length(id)
    idout <- seq(lpos+1,length.out=lid)
    out[idout] <- x[id]
    x <- x[-id]
    lpos <- lpos + lid
  }
  if(length(out) < length(libs)) {
    v <- setdiff(libs, out)
    stop(sprintf("Unorderable items :n%s", paste("", v, sep="", collapse="n")))
  }
  out
}

相关内容

最新更新

热门标签：