r-使用一些决策规则加速for循环(或远离它)



我有一个for循环,它试图检查向量vals中的每个元素是否正好包含在4个不同的向量(dpupdeue(中两次,但是:

  1. 应仅在dpde中精确找到一次,并且
  2. 应该只在upue中找到一次

我试图检查的向量在数百万个元素中,这需要几个小时,我认为下面的代码可以加快速度。

MRE:

vals <- c('a', 'b', 'c', 'd', 'e', 'f') # 6 elements to be verified
# only 1 of these two
dp <- c('a', 'c', 'd','f', 'f')
de <- c('b','a', 'd')
# only one of these two
up <- c('b', 'd', 'e')
ue <- c('c')    
i <- list()
for (val in vals) {
dipa <- sum(grepl(val, dp)) # attemps to find val in dp and sums
ulpa <- sum(grepl(val, up)) # attemps to find val in up and sums
diex <- sum(grepl(val, de)) # attemps to find val in de and sums
ulex <- sum(grepl(val, ue)) # attemps to find val in ue and sums
f <- sum(sum(dipa) + sum(ulpa) + sum(diex) + sum(ulex)) == 2 # sum two # overall, it has to be found 2 times exactly
pars <- dipa + diex == 1 # once in dipa or diex
excs <- ulpa + ulex == 1 # once in ulpa or ulex
if(isTRUE(f) & isTRUE(pars) & isTRUE(excs)) {
i[val] <- 1 #if all of these 3 conditions are true, then add
} else {
next
}
}

在上面的例子中,i应该只保持:

  1. b(因为它在upde中分别找到一次
  2. c(因为在dpue中分别找到一次

vals的每个元素都可以在其他4个向量中的任何一个向量中出现多次,但理想情况下,在有上述限制的情况下,只会出现两次。

这能工作吗?

pervec <- sapply(list(dp,de,up,ue),
function(a) rowSums(sapply(a, `==`, vals)))
pervec
#      [,1] [,2] [,3] [,4]
# [1,]    1    1    0    0
# [2,]    0    1    1    0
# [3,]    1    0    0    1
# [4,]    1    1    1    0
# [5,]    0    0    1    0
# [6,]    2    0    0    0
ind <- xor(pervec[,1] == 1, pervec[,2] == 1) & xor(pervec[,3] == 1, pervec[,4] == 1)
ind
# [1] FALSE  TRUE  TRUE FALSE FALSE FALSE
vals[ind]
# [1] "b" "c"

最新更新