我有一个for循环,它试图检查向量vals
中的每个元素是否正好包含在4个不同的向量(dp
、up
、de
、ue
(中两次,但是:
- 应仅在
dp
或de
中精确找到一次,并且 - 应该只在
up
或ue
中找到一次
我试图检查的向量在数百万个元素中,这需要几个小时,我认为下面的代码可以加快速度。
MRE:
vals <- c('a', 'b', 'c', 'd', 'e', 'f') # 6 elements to be verified
# only 1 of these two
dp <- c('a', 'c', 'd','f', 'f')
de <- c('b','a', 'd')
# only one of these two
up <- c('b', 'd', 'e')
ue <- c('c')
i <- list()
for (val in vals) {
dipa <- sum(grepl(val, dp)) # attemps to find val in dp and sums
ulpa <- sum(grepl(val, up)) # attemps to find val in up and sums
diex <- sum(grepl(val, de)) # attemps to find val in de and sums
ulex <- sum(grepl(val, ue)) # attemps to find val in ue and sums
f <- sum(sum(dipa) + sum(ulpa) + sum(diex) + sum(ulex)) == 2 # sum two # overall, it has to be found 2 times exactly
pars <- dipa + diex == 1 # once in dipa or diex
excs <- ulpa + ulex == 1 # once in ulpa or ulex
if(isTRUE(f) & isTRUE(pars) & isTRUE(excs)) {
i[val] <- 1 #if all of these 3 conditions are true, then add
} else {
next
}
}
在上面的例子中,i
应该只保持:
b
(因为它在up
和de
中分别找到一次c
(因为在dp
和ue
中分别找到一次
vals的每个元素都可以在其他4个向量中的任何一个向量中出现多次,但理想情况下,在有上述限制的情况下,只会出现两次。
这能工作吗?
pervec <- sapply(list(dp,de,up,ue),
function(a) rowSums(sapply(a, `==`, vals)))
pervec
# [,1] [,2] [,3] [,4]
# [1,] 1 1 0 0
# [2,] 0 1 1 0
# [3,] 1 0 0 1
# [4,] 1 1 1 0
# [5,] 0 0 1 0
# [6,] 2 0 0 0
ind <- xor(pervec[,1] == 1, pervec[,2] == 1) & xor(pervec[,3] == 1, pervec[,4] == 1)
ind
# [1] FALSE TRUE TRUE FALSE FALSE FALSE
vals[ind]
# [1] "b" "c"